git.saurik.com Git - bison.git/blame_incremental

... / ...

Commit	Line	Data
	1	\input texinfo @c --texinfo--
	2	@comment %**start of header
	3	@setfilename bison.info
	4	@include version.texi
	5	@settitle Bison @value{VERSION}
	6	@setchapternewpage odd
	7
	8	@finalout
	9
	10	@c SMALL BOOK version
	11	@c This edition has been formatted so that you can format and print it in
	12	@c the smallbook format.
	13	@c @smallbook
	14
	15	@c Set following if you have the new `shorttitlepage' command
	16	@c @clear shorttitlepage-enabled
	17	@c @set shorttitlepage-enabled
	18
	19	@c ISPELL CHECK: done, 14 Jan 1993 --bob
	20
	21	@c Check COPYRIGHT dates. should be updated in the titlepage, ifinfo
	22	@c titlepage; should NOT be changed in the GPL. --mew
	23
	24	@c FIXME: I don't understand this `iftex'. Obsolete? --akim.
	25	@iftex
	26	@syncodeindex fn cp
	27	@syncodeindex vr cp
	28	@syncodeindex tp cp
	29	@end iftex
	30	@ifinfo
	31	@synindex fn cp
	32	@synindex vr cp
	33	@synindex tp cp
	34	@end ifinfo
	35	@comment %**end of header
	36
	37	@ifinfo
	38	@format
	39	START-INFO-DIR-ENTRY
	40	* bison: (bison). GNU Project parser generator (yacc replacement).
	41	END-INFO-DIR-ENTRY
	42	@end format
	43	@end ifinfo
	44
	45	@ifinfo
	46	This file documents the Bison parser generator.
	47
	48	Copyright (C) 1988, 1989, 1990, 1991, 1992, 1993, 1995, 1998, 1999,
	49	2000, 2001, 2002
	50	Free Software Foundation, Inc.
	51
	52	Permission is granted to make and distribute verbatim copies of
	53	this manual provided the copyright notice and this permission notice
	54	are preserved on all copies.
	55
	56	@ignore
	57	Permission is granted to process this file through Tex and print the
	58	results, provided the printed document carries copying permission
	59	notice identical to this one except for the removal of this paragraph
	60	(this paragraph not being relevant to the printed manual).
	61
	62	@end ignore
	63	Permission is granted to copy and distribute modified versions of this
	64	manual under the conditions for verbatim copying, provided also that the
	65	sections entitled ``GNU General Public License'' and ``Conditions for
	66	Using Bison'' are included exactly as in the original, and provided that
	67	the entire resulting derived work is distributed under the terms of a
	68	permission notice identical to this one.
	69
	70	Permission is granted to copy and distribute translations of this manual
	71	into another language, under the above conditions for modified versions,
	72	except that the sections entitled ``GNU General Public License'',
	73	``Conditions for Using Bison'' and this permission notice may be
	74	included in translations approved by the Free Software Foundation
	75	instead of in the original English.
	76	@end ifinfo
	77
	78	@ifset shorttitlepage-enabled
	79	@shorttitlepage Bison
	80	@end ifset
	81	@titlepage
	82	@title Bison
	83	@subtitle The YACC-compatible Parser Generator
	84	@subtitle @value{UPDATED}, Bison Version @value{VERSION}
	85
	86	@author by Charles Donnelly and Richard Stallman
	87
	88	@page
	89	@vskip 0pt plus 1filll
	90	Copyright @copyright{} 1988, 1989, 1990, 1991, 1992, 1993, 1995, 1998,
	91	1999, 2000, 2001, 2002
	92	Free Software Foundation, Inc.
	93
	94	@sp 2
	95	Published by the Free Software Foundation @*
	96	59 Temple Place, Suite 330 @*
	97	Boston, MA 02111-1307 USA @*
	98	Printed copies are available from the Free Software Foundation.@*
	99	ISBN 1-882114-44-2
	100
	101	Permission is granted to make and distribute verbatim copies of
	102	this manual provided the copyright notice and this permission notice
	103	are preserved on all copies.
	104
	105	@ignore
	106	Permission is granted to process this file through TeX and print the
	107	results, provided the printed document carries copying permission
	108	notice identical to this one except for the removal of this paragraph
	109	(this paragraph not being relevant to the printed manual).
	110
	111	@end ignore
	112	Permission is granted to copy and distribute modified versions of this
	113	manual under the conditions for verbatim copying, provided also that the
	114	sections entitled ``GNU General Public License'' and ``Conditions for
	115	Using Bison'' are included exactly as in the original, and provided that
	116	the entire resulting derived work is distributed under the terms of a
	117	permission notice identical to this one.
	118
	119	Permission is granted to copy and distribute translations of this manual
	120	into another language, under the above conditions for modified versions,
	121	except that the sections entitled ``GNU General Public License'',
	122	``Conditions for Using Bison'' and this permission notice may be
	123	included in translations approved by the Free Software Foundation
	124	instead of in the original English.
	125	@sp 2
	126	Cover art by Etienne Suvasa.
	127	@end titlepage
	128
	129	@contents
	130
	131	@ifnottex
	132	@node Top
	133	@top Bison
	134
	135	This manual documents version @value{VERSION} of Bison, updated
	136	@value{UPDATED}.
	137	@end ifnottex
	138
	139	@menu
	140	* Introduction::
	141	* Conditions::
	142	* Copying:: The GNU General Public License says
	143	how you can copy and share Bison
	144
	145	Tutorial sections:
	146	* Concepts:: Basic concepts for understanding Bison.
	147	* Examples:: Three simple explained examples of using Bison.
	148
	149	Reference sections:
	150	* Grammar File:: Writing Bison declarations and rules.
	151	* Interface:: C-language interface to the parser function @code{yyparse}.
	152	* Algorithm:: How the Bison parser works at run-time.
	153	* Error Recovery:: Writing rules for error recovery.
	154	* Context Dependency:: What to do if your language syntax is too
	155	messy for Bison to handle straightforwardly.
	156	* Debugging:: Understanding or debugging Bison parsers.
	157	* Invocation:: How to run Bison (to produce the parser source file).
	158	* Table of Symbols:: All the keywords of the Bison language are explained.
	159	* Glossary:: Basic concepts are explained.
	160	* Copying This Manual:: License for copying this manual.
	161	* Index:: Cross-references to the text.
	162
	163	@detailmenu --- The Detailed Node Listing ---
	164
	165	The Concepts of Bison
	166
	167	* Language and Grammar:: Languages and context-free grammars,
	168	as mathematical ideas.
	169	* Grammar in Bison:: How we represent grammars for Bison's sake.
	170	* Semantic Values:: Each token or syntactic grouping can have
	171	a semantic value (the value of an integer,
	172	the name of an identifier, etc.).
	173	* Semantic Actions:: Each rule can have an action containing C code.
	174	* Bison Parser:: What are Bison's input and output,
	175	how is the output used?
	176	* Stages:: Stages in writing and running Bison grammars.
	177	* Grammar Layout:: Overall structure of a Bison grammar file.
	178
	179	Examples
	180
	181	* RPN Calc:: Reverse polish notation calculator;
	182	a first example with no operator precedence.
	183	* Infix Calc:: Infix (algebraic) notation calculator.
	184	Operator precedence is introduced.
	185	* Simple Error Recovery:: Continuing after syntax errors.
	186	* Location Tracking Calc:: Demonstrating the use of @@@var{n} and @@$.
	187	* Multi-function Calc:: Calculator with memory and trig functions.
	188	It uses multiple data-types for semantic values.
	189	* Exercises:: Ideas for improving the multi-function calculator.
	190
	191	Reverse Polish Notation Calculator
	192
	193	* Decls: Rpcalc Decls. Prologue (declarations) for rpcalc.
	194	* Rules: Rpcalc Rules. Grammar Rules for rpcalc, with explanation.
	195	* Lexer: Rpcalc Lexer. The lexical analyzer.
	196	* Main: Rpcalc Main. The controlling function.
	197	* Error: Rpcalc Error. The error reporting function.
	198	* Gen: Rpcalc Gen. Running Bison on the grammar file.
	199	* Comp: Rpcalc Compile. Run the C compiler on the output code.
	200
	201	Grammar Rules for @code{rpcalc}
	202
	203	* Rpcalc Input::
	204	* Rpcalc Line::
	205	* Rpcalc Expr::
	206
	207	Location Tracking Calculator: @code{ltcalc}
	208
	209	* Decls: Ltcalc Decls. Bison and C declarations for ltcalc.
	210	* Rules: Ltcalc Rules. Grammar rules for ltcalc, with explanations.
	211	* Lexer: Ltcalc Lexer. The lexical analyzer.
	212
	213	Multi-Function Calculator: @code{mfcalc}
	214
	215	* Decl: Mfcalc Decl. Bison declarations for multi-function calculator.
	216	* Rules: Mfcalc Rules. Grammar rules for the calculator.
	217	* Symtab: Mfcalc Symtab. Symbol table management subroutines.
	218
	219	Bison Grammar Files
	220
	221	* Grammar Outline:: Overall layout of the grammar file.
	222	* Symbols:: Terminal and nonterminal symbols.
	223	* Rules:: How to write grammar rules.
	224	* Recursion:: Writing recursive rules.
	225	* Semantics:: Semantic values and actions.
	226	* Declarations:: All kinds of Bison declarations are described here.
	227	* Multiple Parsers:: Putting more than one Bison parser in one program.
	228
	229	Outline of a Bison Grammar
	230
	231	* Prologue:: Syntax and usage of the prologue (declarations section).
	232	* Bison Declarations:: Syntax and usage of the Bison declarations section.
	233	* Grammar Rules:: Syntax and usage of the grammar rules section.
	234	* Epilogue:: Syntax and usage of the epilogue (additional code section).
	235
	236	Defining Language Semantics
	237
	238	* Value Type:: Specifying one data type for all semantic values.
	239	* Multiple Types:: Specifying several alternative data types.
	240	* Actions:: An action is the semantic definition of a grammar rule.
	241	* Action Types:: Specifying data types for actions to operate on.
	242	* Mid-Rule Actions:: Most actions go at the end of a rule.
	243	This says when, why and how to use the exceptional
	244	action in the middle of a rule.
	245
	246	Bison Declarations
	247
	248	* Token Decl:: Declaring terminal symbols.
	249	* Precedence Decl:: Declaring terminals with precedence and associativity.
	250	* Union Decl:: Declaring the set of all semantic value types.
	251	* Type Decl:: Declaring the choice of type for a nonterminal symbol.
	252	* Expect Decl:: Suppressing warnings about shift/reduce conflicts.
	253	* Start Decl:: Specifying the start symbol.
	254	* Pure Decl:: Requesting a reentrant parser.
	255	* Decl Summary:: Table of all Bison declarations.
	256
	257	Parser C-Language Interface
	258
	259	* Parser Function:: How to call @code{yyparse} and what it returns.
	260	* Lexical:: You must supply a function @code{yylex}
	261	which reads tokens.
	262	* Error Reporting:: You must supply a function @code{yyerror}.
	263	* Action Features:: Special features for use in actions.
	264
	265	The Lexical Analyzer Function @code{yylex}
	266
	267	* Calling Convention:: How @code{yyparse} calls @code{yylex}.
	268	* Token Values:: How @code{yylex} must return the semantic value
	269	of the token it has read.
	270	* Token Positions:: How @code{yylex} must return the text position
	271	(line number, etc.) of the token, if the
	272	actions want that.
	273	* Pure Calling:: How the calling convention differs
	274	in a pure parser (@pxref{Pure Decl, ,A Pure (Reentrant) Parser}).
	275
	276	The Bison Parser Algorithm
	277
	278	* Look-Ahead:: Parser looks one token ahead when deciding what to do.
	279	* Shift/Reduce:: Conflicts: when either shifting or reduction is valid.
	280	* Precedence:: Operator precedence works by resolving conflicts.
	281	* Contextual Precedence:: When an operator's precedence depends on context.
	282	* Parser States:: The parser is a finite-state-machine with stack.
	283	* Reduce/Reduce:: When two rules are applicable in the same situation.
	284	* Mystery Conflicts:: Reduce/reduce conflicts that look unjustified.
	285	* Stack Overflow:: What happens when stack gets full. How to avoid it.
	286
	287	Operator Precedence
	288
	289	* Why Precedence:: An example showing why precedence is needed.
	290	* Using Precedence:: How to specify precedence in Bison grammars.
	291	* Precedence Examples:: How these features are used in the previous example.
	292	* How Precedence:: How they work.
	293
	294	Handling Context Dependencies
	295
	296	* Semantic Tokens:: Token parsing can depend on the semantic context.
	297	* Lexical Tie-ins:: Token parsing can depend on the syntactic context.
	298	* Tie-in Recovery:: Lexical tie-ins have implications for how
	299	error recovery rules must be written.
	300
	301	Understanding or Debugging Your Parser
	302
	303	* Understanding:: Understanding the structure of your parser.
	304	* Tracing:: Tracing the execution of your parser.
	305
	306	Invoking Bison
	307
	308	* Bison Options:: All the options described in detail,
	309	in alphabetical order by short options.
	310	* Option Cross Key:: Alphabetical list of long options.
	311	* VMS Invocation:: Bison command syntax on VMS.
	312
	313	Copying This Manual
	314
	315	* GNU Free Documentation License:: License for copying this manual.
	316
	317	@end detailmenu
	318	@end menu
	319
	320	@node Introduction
	321	@unnumbered Introduction
	322	@cindex introduction
	323
	324	@dfn{Bison} is a general-purpose parser generator that converts a
	325	grammar description for an LALR(1) context-free grammar into a C
	326	program to parse that grammar. Once you are proficient with Bison,
	327	you may use it to develop a wide range of language parsers, from those
	328	used in simple desk calculators to complex programming languages.
	329
	330	Bison is upward compatible with Yacc: all properly-written Yacc grammars
	331	ought to work with Bison with no change. Anyone familiar with Yacc
	332	should be able to use Bison with little trouble. You need to be fluent in
	333	C programming in order to use Bison or to understand this manual.
	334
	335	We begin with tutorial chapters that explain the basic concepts of using
	336	Bison and show three explained examples, each building on the last. If you
	337	don't know Bison or Yacc, start by reading these chapters. Reference
	338	chapters follow which describe specific aspects of Bison in detail.
	339
	340	Bison was written primarily by Robert Corbett; Richard Stallman made it
	341	Yacc-compatible. Wilfred Hansen of Carnegie Mellon University added
	342	multi-character string literals and other features.
	343
	344	This edition corresponds to version @value{VERSION} of Bison.
	345
	346	@node Conditions
	347	@unnumbered Conditions for Using Bison
	348
	349	As of Bison version 1.24, we have changed the distribution terms for
	350	@code{yyparse} to permit using Bison's output in nonfree programs.
	351	Formerly, Bison parsers could be used only in programs that were free
	352	software.
	353
	354	The other GNU programming tools, such as the GNU C compiler, have never
	355	had such a requirement. They could always be used for nonfree
	356	software. The reason Bison was different was not due to a special
	357	policy decision; it resulted from applying the usual General Public
	358	License to all of the Bison source code.
	359
	360	The output of the Bison utility---the Bison parser file---contains a
	361	verbatim copy of a sizable piece of Bison, which is the code for the
	362	@code{yyparse} function. (The actions from your grammar are inserted
	363	into this function at one point, but the rest of the function is not
	364	changed.) When we applied the GPL terms to the code for @code{yyparse},
	365	the effect was to restrict the use of Bison output to free software.
	366
	367	We didn't change the terms because of sympathy for people who want to
	368	make software proprietary. @strong{Software should be free.} But we
	369	concluded that limiting Bison's use to free software was doing little to
	370	encourage people to make other software free. So we decided to make the
	371	practical conditions for using Bison match the practical conditions for
	372	using the other GNU tools.
	373
	374	@include gpl.texi
	375
	376	@node Concepts
	377	@chapter The Concepts of Bison
	378
	379	This chapter introduces many of the basic concepts without which the
	380	details of Bison will not make sense. If you do not already know how to
	381	use Bison or Yacc, we suggest you start by reading this chapter carefully.
	382
	383	@menu
	384	* Language and Grammar:: Languages and context-free grammars,
	385	as mathematical ideas.
	386	* Grammar in Bison:: How we represent grammars for Bison's sake.
	387	* Semantic Values:: Each token or syntactic grouping can have
	388	a semantic value (the value of an integer,
	389	the name of an identifier, etc.).
	390	* Semantic Actions:: Each rule can have an action containing C code.
	391	* Locations Overview:: Tracking Locations.
	392	* Bison Parser:: What are Bison's input and output,
	393	how is the output used?
	394	* Stages:: Stages in writing and running Bison grammars.
	395	* Grammar Layout:: Overall structure of a Bison grammar file.
	396	@end menu
	397
	398	@node Language and Grammar
	399	@section Languages and Context-Free Grammars
	400
	401	@cindex context-free grammar
	402	@cindex grammar, context-free
	403	In order for Bison to parse a language, it must be described by a
	404	@dfn{context-free grammar}. This means that you specify one or more
	405	@dfn{syntactic groupings} and give rules for constructing them from their
	406	parts. For example, in the C language, one kind of grouping is called an
	407	`expression'. One rule for making an expression might be, ``An expression
	408	can be made of a minus sign and another expression''. Another would be,
	409	``An expression can be an integer''. As you can see, rules are often
	410	recursive, but there must be at least one rule which leads out of the
	411	recursion.
	412
	413	@cindex BNF
	414	@cindex Backus-Naur form
	415	The most common formal system for presenting such rules for humans to read
	416	is @dfn{Backus-Naur Form} or ``BNF'', which was developed in order to
	417	specify the language Algol 60. Any grammar expressed in BNF is a
	418	context-free grammar. The input to Bison is essentially machine-readable
	419	BNF.
	420
	421	Not all context-free languages can be handled by Bison, only those
	422	that are LALR(1). In brief, this means that it must be possible to
	423	tell how to parse any portion of an input string with just a single
	424	token of look-ahead. Strictly speaking, that is a description of an
	425	LR(1) grammar, and LALR(1) involves additional restrictions that are
	426	hard to explain simply; but it is rare in actual practice to find an
	427	LR(1) grammar that fails to be LALR(1). @xref{Mystery Conflicts, ,
	428	Mysterious Reduce/Reduce Conflicts}, for more information on this.
	429
	430	@cindex symbols (abstract)
	431	@cindex token
	432	@cindex syntactic grouping
	433	@cindex grouping, syntactic
	434	In the formal grammatical rules for a language, each kind of syntactic unit
	435	or grouping is named by a @dfn{symbol}. Those which are built by grouping
	436	smaller constructs according to grammatical rules are called
	437	@dfn{nonterminal symbols}; those which can't be subdivided are called
	438	@dfn{terminal symbols} or @dfn{token types}. We call a piece of input
	439	corresponding to a single terminal symbol a @dfn{token}, and a piece
	440	corresponding to a single nonterminal symbol a @dfn{grouping}.
	441
	442	We can use the C language as an example of what symbols, terminal and
	443	nonterminal, mean. The tokens of C are identifiers, constants (numeric and
	444	string), and the various keywords, arithmetic operators and punctuation
	445	marks. So the terminal symbols of a grammar for C include `identifier',
	446	`number', `string', plus one symbol for each keyword, operator or
	447	punctuation mark: `if', `return', `const', `static', `int', `char',
	448	`plus-sign', `open-brace', `close-brace', `comma' and many more. (These
	449	tokens can be subdivided into characters, but that is a matter of
	450	lexicography, not grammar.)
	451
	452	Here is a simple C function subdivided into tokens:
	453
	454	@ifinfo
	455	@example
	456	int /* @r{keyword `int'} */
	457	square (int x) /* @r{identifier, open-paren, identifier,}
	458	@r{identifier, close-paren} */
	459	@{ /* @r{open-brace} */
	460	return x * x; /* @r{keyword `return', identifier, asterisk,
	461	identifier, semicolon} */
	462	@} /* @r{close-brace} */
	463	@end example
	464	@end ifinfo
	465	@ifnotinfo
	466	@example
	467	int /* @r{keyword `int'} */
	468	square (int x) /* @r{identifier, open-paren, identifier, identifier, close-paren} */
	469	@{ /* @r{open-brace} */
	470	return x * x; /* @r{keyword `return', identifier, asterisk, identifier, semicolon} */
	471	@} /* @r{close-brace} */
	472	@end example
	473	@end ifnotinfo
	474
	475	The syntactic groupings of C include the expression, the statement, the
	476	declaration, and the function definition. These are represented in the
	477	grammar of C by nonterminal symbols `expression', `statement',
	478	`declaration' and `function definition'. The full grammar uses dozens of
	479	additional language constructs, each with its own nonterminal symbol, in
	480	order to express the meanings of these four. The example above is a
	481	function definition; it contains one declaration, and one statement. In
	482	the statement, each @samp{x} is an expression and so is @samp{x * x}.
	483
	484	Each nonterminal symbol must have grammatical rules showing how it is made
	485	out of simpler constructs. For example, one kind of C statement is the
	486	@code{return} statement; this would be described with a grammar rule which
	487	reads informally as follows:
	488
	489	@quotation
	490	A `statement' can be made of a `return' keyword, an `expression' and a
	491	`semicolon'.
	492	@end quotation
	493
	494	@noindent
	495	There would be many other rules for `statement', one for each kind of
	496	statement in C.
	497
	498	@cindex start symbol
	499	One nonterminal symbol must be distinguished as the special one which
	500	defines a complete utterance in the language. It is called the @dfn{start
	501	symbol}. In a compiler, this means a complete input program. In the C
	502	language, the nonterminal symbol `sequence of definitions and declarations'
	503	plays this role.
	504
	505	For example, @samp{1 + 2} is a valid C expression---a valid part of a C
	506	program---but it is not valid as an @emph{entire} C program. In the
	507	context-free grammar of C, this follows from the fact that `expression' is
	508	not the start symbol.
	509
	510	The Bison parser reads a sequence of tokens as its input, and groups the
	511	tokens using the grammar rules. If the input is valid, the end result is
	512	that the entire token sequence reduces to a single grouping whose symbol is
	513	the grammar's start symbol. If we use a grammar for C, the entire input
	514	must be a `sequence of definitions and declarations'. If not, the parser
	515	reports a syntax error.
	516
	517	@node Grammar in Bison
	518	@section From Formal Rules to Bison Input
	519	@cindex Bison grammar
	520	@cindex grammar, Bison
	521	@cindex formal grammar
	522
	523	A formal grammar is a mathematical construct. To define the language
	524	for Bison, you must write a file expressing the grammar in Bison syntax:
	525	a @dfn{Bison grammar} file. @xref{Grammar File, ,Bison Grammar Files}.
	526
	527	A nonterminal symbol in the formal grammar is represented in Bison input
	528	as an identifier, like an identifier in C. By convention, it should be
	529	in lower case, such as @code{expr}, @code{stmt} or @code{declaration}.
	530
	531	The Bison representation for a terminal symbol is also called a @dfn{token
	532	type}. Token types as well can be represented as C-like identifiers. By
	533	convention, these identifiers should be upper case to distinguish them from
	534	nonterminals: for example, @code{INTEGER}, @code{IDENTIFIER}, @code{IF} or
	535	@code{RETURN}. A terminal symbol that stands for a particular keyword in
	536	the language should be named after that keyword converted to upper case.
	537	The terminal symbol @code{error} is reserved for error recovery.
	538	@xref{Symbols}.
	539
	540	A terminal symbol can also be represented as a character literal, just like
	541	a C character constant. You should do this whenever a token is just a
	542	single character (parenthesis, plus-sign, etc.): use that same character in
	543	a literal as the terminal symbol for that token.
	544
	545	A third way to represent a terminal symbol is with a C string constant
	546	containing several characters. @xref{Symbols}, for more information.
	547
	548	The grammar rules also have an expression in Bison syntax. For example,
	549	here is the Bison rule for a C @code{return} statement. The semicolon in
	550	quotes is a literal character token, representing part of the C syntax for
	551	the statement; the naked semicolon, and the colon, are Bison punctuation
	552	used in every rule.
	553
	554	@example
	555	stmt: RETURN expr ';'
	556	;
	557	@end example
	558
	559	@noindent
	560	@xref{Rules, ,Syntax of Grammar Rules}.
	561
	562	@node Semantic Values
	563	@section Semantic Values
	564	@cindex semantic value
	565	@cindex value, semantic
	566
	567	A formal grammar selects tokens only by their classifications: for example,
	568	if a rule mentions the terminal symbol `integer constant', it means that
	569	@emph{any} integer constant is grammatically valid in that position. The
	570	precise value of the constant is irrelevant to how to parse the input: if
	571	@samp{x+4} is grammatical then @samp{x+1} or @samp{x+3989} is equally
	572	grammatical.
	573
	574	But the precise value is very important for what the input means once it is
	575	parsed. A compiler is useless if it fails to distinguish between 4, 1 and
	576	3989 as constants in the program! Therefore, each token in a Bison grammar
	577	has both a token type and a @dfn{semantic value}. @xref{Semantics, ,Defining Language Semantics},
	578	for details.
	579
	580	The token type is a terminal symbol defined in the grammar, such as
	581	@code{INTEGER}, @code{IDENTIFIER} or @code{','}. It tells everything
	582	you need to know to decide where the token may validly appear and how to
	583	group it with other tokens. The grammar rules know nothing about tokens
	584	except their types.
	585
	586	The semantic value has all the rest of the information about the
	587	meaning of the token, such as the value of an integer, or the name of an
	588	identifier. (A token such as @code{','} which is just punctuation doesn't
	589	need to have any semantic value.)
	590
	591	For example, an input token might be classified as token type
	592	@code{INTEGER} and have the semantic value 4. Another input token might
	593	have the same token type @code{INTEGER} but value 3989. When a grammar
	594	rule says that @code{INTEGER} is allowed, either of these tokens is
	595	acceptable because each is an @code{INTEGER}. When the parser accepts the
	596	token, it keeps track of the token's semantic value.
	597
	598	Each grouping can also have a semantic value as well as its nonterminal
	599	symbol. For example, in a calculator, an expression typically has a
	600	semantic value that is a number. In a compiler for a programming
	601	language, an expression typically has a semantic value that is a tree
	602	structure describing the meaning of the expression.
	603
	604	@node Semantic Actions
	605	@section Semantic Actions
	606	@cindex semantic actions
	607	@cindex actions, semantic
	608
	609	In order to be useful, a program must do more than parse input; it must
	610	also produce some output based on the input. In a Bison grammar, a grammar
	611	rule can have an @dfn{action} made up of C statements. Each time the
	612	parser recognizes a match for that rule, the action is executed.
	613	@xref{Actions}.
	614
	615	Most of the time, the purpose of an action is to compute the semantic value
	616	of the whole construct from the semantic values of its parts. For example,
	617	suppose we have a rule which says an expression can be the sum of two
	618	expressions. When the parser recognizes such a sum, each of the
	619	subexpressions has a semantic value which describes how it was built up.
	620	The action for this rule should create a similar sort of value for the
	621	newly recognized larger expression.
	622
	623	For example, here is a rule that says an expression can be the sum of
	624	two subexpressions:
	625
	626	@example
	627	expr: expr '+' expr @{ $$ = $1 + $3; @}
	628	;
	629	@end example
	630
	631	@noindent
	632	The action says how to produce the semantic value of the sum expression
	633	from the values of the two subexpressions.
	634
	635	@node Locations Overview
	636	@section Locations
	637	@cindex location
	638	@cindex textual position
	639	@cindex position, textual
	640
	641	Many applications, like interpreters or compilers, have to produce verbose
	642	and useful error messages. To achieve this, one must be able to keep track of
	643	the @dfn{textual position}, or @dfn{location}, of each syntactic construct.
	644	Bison provides a mechanism for handling these locations.
	645
	646	Each token has a semantic value. In a similar fashion, each token has an
	647	associated location, but the type of locations is the same for all tokens and
	648	groupings. Moreover, the output parser is equipped with a default data
	649	structure for storing locations (@pxref{Locations}, for more details).
	650
	651	Like semantic values, locations can be reached in actions using a dedicated
	652	set of constructs. In the example above, the location of the whole grouping
	653	is @code{@@$}, while the locations of the subexpressions are @code{@@1} and
	654	@code{@@3}.
	655
	656	When a rule is matched, a default action is used to compute the semantic value
	657	of its left hand side (@pxref{Actions}). In the same way, another default
	658	action is used for locations. However, the action for locations is general
	659	enough for most cases, meaning there is usually no need to describe for each
	660	rule how @code{@@$} should be formed. When building a new location for a given
	661	grouping, the default behavior of the output parser is to take the beginning
	662	of the first symbol, and the end of the last symbol.
	663
	664	@node Bison Parser
	665	@section Bison Output: the Parser File
	666	@cindex Bison parser
	667	@cindex Bison utility
	668	@cindex lexical analyzer, purpose
	669	@cindex parser
	670
	671	When you run Bison, you give it a Bison grammar file as input. The output
	672	is a C source file that parses the language described by the grammar.
	673	This file is called a @dfn{Bison parser}. Keep in mind that the Bison
	674	utility and the Bison parser are two distinct programs: the Bison utility
	675	is a program whose output is the Bison parser that becomes part of your
	676	program.
	677
	678	The job of the Bison parser is to group tokens into groupings according to
	679	the grammar rules---for example, to build identifiers and operators into
	680	expressions. As it does this, it runs the actions for the grammar rules it
	681	uses.
	682
	683	The tokens come from a function called the @dfn{lexical analyzer} that
	684	you must supply in some fashion (such as by writing it in C). The Bison
	685	parser calls the lexical analyzer each time it wants a new token. It
	686	doesn't know what is ``inside'' the tokens (though their semantic values
	687	may reflect this). Typically the lexical analyzer makes the tokens by
	688	parsing characters of text, but Bison does not depend on this.
	689	@xref{Lexical, ,The Lexical Analyzer Function @code{yylex}}.
	690
	691	The Bison parser file is C code which defines a function named
	692	@code{yyparse} which implements that grammar. This function does not make
	693	a complete C program: you must supply some additional functions. One is
	694	the lexical analyzer. Another is an error-reporting function which the
	695	parser calls to report an error. In addition, a complete C program must
	696	start with a function called @code{main}; you have to provide this, and
	697	arrange for it to call @code{yyparse} or the parser will never run.
	698	@xref{Interface, ,Parser C-Language Interface}.
	699
	700	Aside from the token type names and the symbols in the actions you
	701	write, all symbols defined in the Bison parser file itself
	702	begin with @samp{yy} or @samp{YY}. This includes interface functions
	703	such as the lexical analyzer function @code{yylex}, the error reporting
	704	function @code{yyerror} and the parser function @code{yyparse} itself.
	705	This also includes numerous identifiers used for internal purposes.
	706	Therefore, you should avoid using C identifiers starting with @samp{yy}
	707	or @samp{YY} in the Bison grammar file except for the ones defined in
	708	this manual.
	709
	710	In some cases the Bison parser file includes system headers, and in
	711	those cases your code should respect the identifiers reserved by those
	712	headers. On some non-@sc{gnu} hosts, @code{<alloca.h>},
	713	@code{<stddef.h>}, and @code{<stdlib.h>} are included as needed to
	714	declare memory allocators and related types. Other system headers may
	715	be included if you define @code{YYDEBUG} to a nonzero value
	716	(@pxref{Tracing, ,Tracing Your Parser}).
	717
	718	@node Stages
	719	@section Stages in Using Bison
	720	@cindex stages in using Bison
	721	@cindex using Bison
	722
	723	The actual language-design process using Bison, from grammar specification
	724	to a working compiler or interpreter, has these parts:
	725
	726	@enumerate
	727	@item
	728	Formally specify the grammar in a form recognized by Bison
	729	(@pxref{Grammar File, ,Bison Grammar Files}). For each grammatical rule
	730	in the language, describe the action that is to be taken when an
	731	instance of that rule is recognized. The action is described by a
	732	sequence of C statements.
	733
	734	@item
	735	Write a lexical analyzer to process input and pass tokens to the parser.
	736	The lexical analyzer may be written by hand in C (@pxref{Lexical, ,The
	737	Lexical Analyzer Function @code{yylex}}). It could also be produced
	738	using Lex, but the use of Lex is not discussed in this manual.
	739
	740	@item
	741	Write a controlling function that calls the Bison-produced parser.
	742
	743	@item
	744	Write error-reporting routines.
	745	@end enumerate
	746
	747	To turn this source code as written into a runnable program, you
	748	must follow these steps:
	749
	750	@enumerate
	751	@item
	752	Run Bison on the grammar to produce the parser.
	753
	754	@item
	755	Compile the code output by Bison, as well as any other source files.
	756
	757	@item
	758	Link the object files to produce the finished product.
	759	@end enumerate
	760
	761	@node Grammar Layout
	762	@section The Overall Layout of a Bison Grammar
	763	@cindex grammar file
	764	@cindex file format
	765	@cindex format of grammar file
	766	@cindex layout of Bison grammar
	767
	768	The input file for the Bison utility is a @dfn{Bison grammar file}. The
	769	general form of a Bison grammar file is as follows:
	770
	771	@example
	772	%@{
	773	@var{Prologue}
	774	%@}
	775
	776	@var{Bison declarations}
	777
	778	%%
	779	@var{Grammar rules}
	780	%%
	781	@var{Epilogue}
	782	@end example
	783
	784	@noindent
	785	The @samp{%%}, @samp{%@{} and @samp{%@}} are punctuation that appears
	786	in every Bison grammar file to separate the sections.
	787
	788	The prologue may define types and variables used in the actions. You can
	789	also use preprocessor commands to define macros used there, and use
	790	@code{#include} to include header files that do any of these things.
	791
	792	The Bison declarations declare the names of the terminal and nonterminal
	793	symbols, and may also describe operator precedence and the data types of
	794	semantic values of various symbols.
	795
	796	The grammar rules define how to construct each nonterminal symbol from its
	797	parts.
	798
	799	The epilogue can contain any code you want to use. Often the definition of
	800	the lexical analyzer @code{yylex} goes here, plus subroutines called by the
	801	actions in the grammar rules. In a simple program, all the rest of the
	802	program can go here.
	803
	804	@node Examples
	805	@chapter Examples
	806	@cindex simple examples
	807	@cindex examples, simple
	808
	809	Now we show and explain three sample programs written using Bison: a
	810	reverse polish notation calculator, an algebraic (infix) notation
	811	calculator, and a multi-function calculator. All three have been tested
	812	under BSD Unix 4.3; each produces a usable, though limited, interactive
	813	desk-top calculator.
	814
	815	These examples are simple, but Bison grammars for real programming
	816	languages are written the same way.
	817	@ifinfo
	818	You can copy these examples out of the Info file and into a source file
	819	to try them.
	820	@end ifinfo
	821
	822	@menu
	823	* RPN Calc:: Reverse polish notation calculator;
	824	a first example with no operator precedence.
	825	* Infix Calc:: Infix (algebraic) notation calculator.
	826	Operator precedence is introduced.
	827	* Simple Error Recovery:: Continuing after syntax errors.
	828	* Location Tracking Calc:: Demonstrating the use of @@@var{n} and @@$.
	829	* Multi-function Calc:: Calculator with memory and trig functions.
	830	It uses multiple data-types for semantic values.
	831	* Exercises:: Ideas for improving the multi-function calculator.
	832	@end menu
	833
	834	@node RPN Calc
	835	@section Reverse Polish Notation Calculator
	836	@cindex reverse polish notation
	837	@cindex polish notation calculator
	838	@cindex @code{rpcalc}
	839	@cindex calculator, simple
	840
	841	The first example is that of a simple double-precision @dfn{reverse polish
	842	notation} calculator (a calculator using postfix operators). This example
	843	provides a good starting point, since operator precedence is not an issue.
	844	The second example will illustrate how operator precedence is handled.
	845
	846	The source code for this calculator is named @file{rpcalc.y}. The
	847	@samp{.y} extension is a convention used for Bison input files.
	848
	849	@menu
	850	* Decls: Rpcalc Decls. Prologue (declarations) for rpcalc.
	851	* Rules: Rpcalc Rules. Grammar Rules for rpcalc, with explanation.
	852	* Lexer: Rpcalc Lexer. The lexical analyzer.
	853	* Main: Rpcalc Main. The controlling function.
	854	* Error: Rpcalc Error. The error reporting function.
	855	* Gen: Rpcalc Gen. Running Bison on the grammar file.
	856	* Comp: Rpcalc Compile. Run the C compiler on the output code.
	857	@end menu
	858
	859	@node Rpcalc Decls
	860	@subsection Declarations for @code{rpcalc}
	861
	862	Here are the C and Bison declarations for the reverse polish notation
	863	calculator. As in C, comments are placed between @samp{/@dots{}/}.
	864
	865	@example
	866	/* Reverse polish notation calculator. */
	867
	868	%@{
	869	#define YYSTYPE double
	870	#include <math.h>
	871	%@}
	872
	873	%token NUM
	874
	875	%% /* Grammar rules and actions follow */
	876	@end example
	877
	878	The declarations section (@pxref{Prologue, , The prologue}) contains two
	879	preprocessor directives.
	880
	881	The @code{#define} directive defines the macro @code{YYSTYPE}, thus
	882	specifying the C data type for semantic values of both tokens and
	883	groupings (@pxref{Value Type, ,Data Types of Semantic Values}). The
	884	Bison parser will use whatever type @code{YYSTYPE} is defined as; if you
	885	don't define it, @code{int} is the default. Because we specify
	886	@code{double}, each token and each expression has an associated value,
	887	which is a floating point number.
	888
	889	The @code{#include} directive is used to declare the exponentiation
	890	function @code{pow}.
	891
	892	The second section, Bison declarations, provides information to Bison
	893	about the token types (@pxref{Bison Declarations, ,The Bison
	894	Declarations Section}). Each terminal symbol that is not a
	895	single-character literal must be declared here. (Single-character
	896	literals normally don't need to be declared.) In this example, all the
	897	arithmetic operators are designated by single-character literals, so the
	898	only terminal symbol that needs to be declared is @code{NUM}, the token
	899	type for numeric constants.
	900
	901	@node Rpcalc Rules
	902	@subsection Grammar Rules for @code{rpcalc}
	903
	904	Here are the grammar rules for the reverse polish notation calculator.
	905
	906	@example
	907	input: /* empty */
	908	\| input line
	909	;
	910
	911	line: '\n'
	912	\| exp '\n' @{ printf ("\t%.10g\n", $1); @}
	913	;
	914
	915	exp: NUM @{ $$ = $1; @}
	916	\| exp exp '+' @{ $$ = $1 + $2; @}
	917	\| exp exp '-' @{ $$ = $1 - $2; @}
	918	\| exp exp '' @{ $$ = $1 $2; @}
	919	\| exp exp '/' @{ $$ = $1 / $2; @}
	920	/* Exponentiation */
	921	\| exp exp '^' @{ $$ = pow ($1, $2); @}
	922	/* Unary minus */
	923	\| exp 'n' @{ $$ = -$1; @}
	924	;
	925	%%
	926	@end example
	927
	928	The groupings of the rpcalc ``language'' defined here are the expression
	929	(given the name @code{exp}), the line of input (@code{line}), and the
	930	complete input transcript (@code{input}). Each of these nonterminal
	931	symbols has several alternate rules, joined by the @samp{\|} punctuator
	932	which is read as ``or''. The following sections explain what these rules
	933	mean.
	934
	935	The semantics of the language is determined by the actions taken when a
	936	grouping is recognized. The actions are the C code that appears inside
	937	braces. @xref{Actions}.
	938
	939	You must specify these actions in C, but Bison provides the means for
	940	passing semantic values between the rules. In each action, the
	941	pseudo-variable @code{$$} stands for the semantic value for the grouping
	942	that the rule is going to construct. Assigning a value to @code{$$} is the
	943	main job of most actions. The semantic values of the components of the
	944	rule are referred to as @code{$1}, @code{$2}, and so on.
	945
	946	@menu
	947	* Rpcalc Input::
	948	* Rpcalc Line::
	949	* Rpcalc Expr::
	950	@end menu
	951
	952	@node Rpcalc Input
	953	@subsubsection Explanation of @code{input}
	954
	955	Consider the definition of @code{input}:
	956
	957	@example
	958	input: /* empty */
	959	\| input line
	960	;
	961	@end example
	962
	963	This definition reads as follows: ``A complete input is either an empty
	964	string, or a complete input followed by an input line''. Notice that
	965	``complete input'' is defined in terms of itself. This definition is said
	966	to be @dfn{left recursive} since @code{input} appears always as the
	967	leftmost symbol in the sequence. @xref{Recursion, ,Recursive Rules}.
	968
	969	The first alternative is empty because there are no symbols between the
	970	colon and the first @samp{\|}; this means that @code{input} can match an
	971	empty string of input (no tokens). We write the rules this way because it
	972	is legitimate to type @kbd{Ctrl-d} right after you start the calculator.
	973	It's conventional to put an empty alternative first and write the comment
	974	@samp{/* empty */} in it.
	975
	976	The second alternate rule (@code{input line}) handles all nontrivial input.
	977	It means, ``After reading any number of lines, read one more line if
	978	possible.'' The left recursion makes this rule into a loop. Since the
	979	first alternative matches empty input, the loop can be executed zero or
	980	more times.
	981
	982	The parser function @code{yyparse} continues to process input until a
	983	grammatical error is seen or the lexical analyzer says there are no more
	984	input tokens; we will arrange for the latter to happen at end of file.
	985
	986	@node Rpcalc Line
	987	@subsubsection Explanation of @code{line}
	988
	989	Now consider the definition of @code{line}:
	990
	991	@example
	992	line: '\n'
	993	\| exp '\n' @{ printf ("\t%.10g\n", $1); @}
	994	;
	995	@end example
	996
	997	The first alternative is a token which is a newline character; this means
	998	that rpcalc accepts a blank line (and ignores it, since there is no
	999	action). The second alternative is an expression followed by a newline.
	1000	This is the alternative that makes rpcalc useful. The semantic value of
	1001	the @code{exp} grouping is the value of @code{$1} because the @code{exp} in
	1002	question is the first symbol in the alternative. The action prints this
	1003	value, which is the result of the computation the user asked for.
	1004
	1005	This action is unusual because it does not assign a value to @code{$$}. As
	1006	a consequence, the semantic value associated with the @code{line} is
	1007	uninitialized (its value will be unpredictable). This would be a bug if
	1008	that value were ever used, but we don't use it: once rpcalc has printed the
	1009	value of the user's input line, that value is no longer needed.
	1010
	1011	@node Rpcalc Expr
	1012	@subsubsection Explanation of @code{expr}
	1013
	1014	The @code{exp} grouping has several rules, one for each kind of expression.
	1015	The first rule handles the simplest expressions: those that are just numbers.
	1016	The second handles an addition-expression, which looks like two expressions
	1017	followed by a plus-sign. The third handles subtraction, and so on.
	1018
	1019	@example
	1020	exp: NUM
	1021	\| exp exp '+' @{ $$ = $1 + $2; @}
	1022	\| exp exp '-' @{ $$ = $1 - $2; @}
	1023	@dots{}
	1024	;
	1025	@end example
	1026
	1027	We have used @samp{\|} to join all the rules for @code{exp}, but we could
	1028	equally well have written them separately:
	1029
	1030	@example
	1031	exp: NUM ;
	1032	exp: exp exp '+' @{ $$ = $1 + $2; @} ;
	1033	exp: exp exp '-' @{ $$ = $1 - $2; @} ;
	1034	@dots{}
	1035	@end example
	1036
	1037	Most of the rules have actions that compute the value of the expression in
	1038	terms of the value of its parts. For example, in the rule for addition,
	1039	@code{$1} refers to the first component @code{exp} and @code{$2} refers to
	1040	the second one. The third component, @code{'+'}, has no meaningful
	1041	associated semantic value, but if it had one you could refer to it as
	1042	@code{$3}. When @code{yyparse} recognizes a sum expression using this
	1043	rule, the sum of the two subexpressions' values is produced as the value of
	1044	the entire expression. @xref{Actions}.
	1045
	1046	You don't have to give an action for every rule. When a rule has no
	1047	action, Bison by default copies the value of @code{$1} into @code{$$}.
	1048	This is what happens in the first rule (the one that uses @code{NUM}).
	1049
	1050	The formatting shown here is the recommended convention, but Bison does
	1051	not require it. You can add or change whitespace as much as you wish.
	1052	For example, this:
	1053
	1054	@example
	1055	exp : NUM \| exp exp '+' @{$$ = $1 + $2; @} \| @dots{}
	1056	@end example
	1057
	1058	@noindent
	1059	means the same thing as this:
	1060
	1061	@example
	1062	exp: NUM
	1063	\| exp exp '+' @{ $$ = $1 + $2; @}
	1064	\| @dots{}
	1065	@end example
	1066
	1067	@noindent
	1068	The latter, however, is much more readable.
	1069
	1070	@node Rpcalc Lexer
	1071	@subsection The @code{rpcalc} Lexical Analyzer
	1072	@cindex writing a lexical analyzer
	1073	@cindex lexical analyzer, writing
	1074
	1075	The lexical analyzer's job is low-level parsing: converting characters
	1076	or sequences of characters into tokens. The Bison parser gets its
	1077	tokens by calling the lexical analyzer. @xref{Lexical, ,The Lexical
	1078	Analyzer Function @code{yylex}}.
	1079
	1080	Only a simple lexical analyzer is needed for the RPN calculator. This
	1081	lexical analyzer skips blanks and tabs, then reads in numbers as
	1082	@code{double} and returns them as @code{NUM} tokens. Any other character
	1083	that isn't part of a number is a separate token. Note that the token-code
	1084	for such a single-character token is the character itself.
	1085
	1086	The return value of the lexical analyzer function is a numeric code which
	1087	represents a token type. The same text used in Bison rules to stand for
	1088	this token type is also a C expression for the numeric code for the type.
	1089	This works in two ways. If the token type is a character literal, then its
	1090	numeric code is that of the character; you can use the same
	1091	character literal in the lexical analyzer to express the number. If the
	1092	token type is an identifier, that identifier is defined by Bison as a C
	1093	macro whose definition is the appropriate number. In this example,
	1094	therefore, @code{NUM} becomes a macro for @code{yylex} to use.
	1095
	1096	The semantic value of the token (if it has one) is stored into the
	1097	global variable @code{yylval}, which is where the Bison parser will look
	1098	for it. (The C data type of @code{yylval} is @code{YYSTYPE}, which was
	1099	defined at the beginning of the grammar; @pxref{Rpcalc Decls,
	1100	,Declarations for @code{rpcalc}}.)
	1101
	1102	A token type code of zero is returned if the end-of-file is encountered.
	1103	(Bison recognizes any nonpositive value as indicating the end of the
	1104	input.)
	1105
	1106	Here is the code for the lexical analyzer:
	1107
	1108	@example
	1109	@group
	1110	/* Lexical analyzer returns a double floating point
	1111	number on the stack and the token NUM, or the numeric code
	1112	of the character read if not a number. Skips all blanks
	1113	and tabs, returns 0 for EOF. */
	1114
	1115	#include <ctype.h>
	1116	@end group
	1117
	1118	@group
	1119	int
	1120	yylex (void)
	1121	@{
	1122	int c;
	1123
	1124	/* skip white space */
	1125	while ((c = getchar ()) == ' ' \|\| c == '\t')
	1126	;
	1127	@end group
	1128	@group
	1129	/* process numbers */
	1130	if (c == '.' \|\| isdigit (c))
	1131	@{
	1132	ungetc (c, stdin);
	1133	scanf ("%lf", &yylval);
	1134	return NUM;
	1135	@}
	1136	@end group
	1137	@group
	1138	/* return end-of-file */
	1139	if (c == EOF)
	1140	return 0;
	1141	/* return single chars */
	1142	return c;
	1143	@}
	1144	@end group
	1145	@end example
	1146
	1147	@node Rpcalc Main
	1148	@subsection The Controlling Function
	1149	@cindex controlling function
	1150	@cindex main function in simple example
	1151
	1152	In keeping with the spirit of this example, the controlling function is
	1153	kept to the bare minimum. The only requirement is that it call
	1154	@code{yyparse} to start the process of parsing.
	1155
	1156	@example
	1157	@group
	1158	int
	1159	main (void)
	1160	@{
	1161	return yyparse ();
	1162	@}
	1163	@end group
	1164	@end example
	1165
	1166	@node Rpcalc Error
	1167	@subsection The Error Reporting Routine
	1168	@cindex error reporting routine
	1169
	1170	When @code{yyparse} detects a syntax error, it calls the error reporting
	1171	function @code{yyerror} to print an error message (usually but not
	1172	always @code{"parse error"}). It is up to the programmer to supply
	1173	@code{yyerror} (@pxref{Interface, ,Parser C-Language Interface}), so
	1174	here is the definition we will use:
	1175
	1176	@example
	1177	@group
	1178	#include <stdio.h>
	1179
	1180	void
	1181	yyerror (const char s) / Called by yyparse on error */
	1182	@{
	1183	printf ("%s\n", s);
	1184	@}
	1185	@end group
	1186	@end example
	1187
	1188	After @code{yyerror} returns, the Bison parser may recover from the error
	1189	and continue parsing if the grammar contains a suitable error rule
	1190	(@pxref{Error Recovery}). Otherwise, @code{yyparse} returns nonzero. We
	1191	have not written any error rules in this example, so any invalid input will
	1192	cause the calculator program to exit. This is not clean behavior for a
	1193	real calculator, but it is adequate for the first example.
	1194
	1195	@node Rpcalc Gen
	1196	@subsection Running Bison to Make the Parser
	1197	@cindex running Bison (introduction)
	1198
	1199	Before running Bison to produce a parser, we need to decide how to
	1200	arrange all the source code in one or more source files. For such a
	1201	simple example, the easiest thing is to put everything in one file. The
	1202	definitions of @code{yylex}, @code{yyerror} and @code{main} go at the
	1203	end, in the epilogue of the file
	1204	(@pxref{Grammar Layout, ,The Overall Layout of a Bison Grammar}).
	1205
	1206	For a large project, you would probably have several source files, and use
	1207	@code{make} to arrange to recompile them.
	1208
	1209	With all the source in a single file, you use the following command to
	1210	convert it into a parser file:
	1211
	1212	@example
	1213	bison @var{file_name}.y
	1214	@end example
	1215
	1216	@noindent
	1217	In this example the file was called @file{rpcalc.y} (for ``Reverse Polish
	1218	CALCulator''). Bison produces a file named @file{@var{file_name}.tab.c},
	1219	removing the @samp{.y} from the original file name. The file output by
	1220	Bison contains the source code for @code{yyparse}. The additional
	1221	functions in the input file (@code{yylex}, @code{yyerror} and @code{main})
	1222	are copied verbatim to the output.
	1223
	1224	@node Rpcalc Compile
	1225	@subsection Compiling the Parser File
	1226	@cindex compiling the parser
	1227
	1228	Here is how to compile and run the parser file:
	1229
	1230	@example
	1231	@group
	1232	# @r{List files in current directory.}
	1233	$ @kbd{ls}
	1234	rpcalc.tab.c rpcalc.y
	1235	@end group
	1236
	1237	@group
	1238	# @r{Compile the Bison parser.}
	1239	# @r{@samp{-lm} tells compiler to search math library for @code{pow}.}
	1240	$ @kbd{cc rpcalc.tab.c -lm -o rpcalc}
	1241	@end group
	1242
	1243	@group
	1244	# @r{List files again.}
	1245	$ @kbd{ls}
	1246	rpcalc rpcalc.tab.c rpcalc.y
	1247	@end group
	1248	@end example
	1249
	1250	The file @file{rpcalc} now contains the executable code. Here is an
	1251	example session using @code{rpcalc}.
	1252
	1253	@example
	1254	$ @kbd{rpcalc}
	1255	@kbd{4 9 +}
	1256	13
	1257	@kbd{3 7 + 3 4 5 *+-}
	1258	-13
	1259	@kbd{3 7 + 3 4 5 * + - n} @r{Note the unary minus, @samp{n}}
	1260	13
	1261	@kbd{5 6 / 4 n +}
	1262	-3.166666667
	1263	@kbd{3 4 ^} @r{Exponentiation}
	1264	81
	1265	@kbd{^D} @r{End-of-file indicator}
	1266	$
	1267	@end example
	1268
	1269	@node Infix Calc
	1270	@section Infix Notation Calculator: @code{calc}
	1271	@cindex infix notation calculator
	1272	@cindex @code{calc}
	1273	@cindex calculator, infix notation
	1274
	1275	We now modify rpcalc to handle infix operators instead of postfix. Infix
	1276	notation involves the concept of operator precedence and the need for
	1277	parentheses nested to arbitrary depth. Here is the Bison code for
	1278	@file{calc.y}, an infix desk-top calculator.
	1279
	1280	@example
	1281	/* Infix notation calculator--calc */
	1282
	1283	%@{
	1284	#define YYSTYPE double
	1285	#include <math.h>
	1286	%@}
	1287
	1288	/* BISON Declarations */
	1289	%token NUM
	1290	%left '-' '+'
	1291	%left '*' '/'
	1292	%left NEG /* negation--unary minus */
	1293	%right '^' /* exponentiation */
	1294
	1295	/* Grammar follows */
	1296	%%
	1297	input: /* empty string */
	1298	\| input line
	1299	;
	1300
	1301	line: '\n'
	1302	\| exp '\n' @{ printf ("\t%.10g\n", $1); @}
	1303	;
	1304
	1305	exp: NUM @{ $$ = $1; @}
	1306	\| exp '+' exp @{ $$ = $1 + $3; @}
	1307	\| exp '-' exp @{ $$ = $1 - $3; @}
	1308	\| exp '' exp @{ $$ = $1 $3; @}
	1309	\| exp '/' exp @{ $$ = $1 / $3; @}
	1310	\| '-' exp %prec NEG @{ $$ = -$2; @}
	1311	\| exp '^' exp @{ $$ = pow ($1, $3); @}
	1312	\| '(' exp ')' @{ $$ = $2; @}
	1313	;
	1314	%%
	1315	@end example
	1316
	1317	@noindent
	1318	The functions @code{yylex}, @code{yyerror} and @code{main} can be the
	1319	same as before.
	1320
	1321	There are two important new features shown in this code.
	1322
	1323	In the second section (Bison declarations), @code{%left} declares token
	1324	types and says they are left-associative operators. The declarations
	1325	@code{%left} and @code{%right} (right associativity) take the place of
	1326	@code{%token} which is used to declare a token type name without
	1327	associativity. (These tokens are single-character literals, which
	1328	ordinarily don't need to be declared. We declare them here to specify
	1329	the associativity.)
	1330
	1331	Operator precedence is determined by the line ordering of the
	1332	declarations; the higher the line number of the declaration (lower on
	1333	the page or screen), the higher the precedence. Hence, exponentiation
	1334	has the highest precedence, unary minus (@code{NEG}) is next, followed
	1335	by @samp{*} and @samp{/}, and so on. @xref{Precedence, ,Operator
	1336	Precedence}.
	1337
	1338	The other important new feature is the @code{%prec} in the grammar
	1339	section for the unary minus operator. The @code{%prec} simply instructs
	1340	Bison that the rule @samp{\| '-' exp} has the same precedence as
	1341	@code{NEG}---in this case the next-to-highest. @xref{Contextual
	1342	Precedence, ,Context-Dependent Precedence}.
	1343
	1344	Here is a sample run of @file{calc.y}:
	1345
	1346	@need 500
	1347	@example
	1348	$ @kbd{calc}
	1349	@kbd{4 + 4.5 - (34/(8*3+-3))}
	1350	6.880952381
	1351	@kbd{-56 + 2}
	1352	-54
	1353	@kbd{3 ^ 2}
	1354	9
	1355	@end example
	1356
	1357	@node Simple Error Recovery
	1358	@section Simple Error Recovery
	1359	@cindex error recovery, simple
	1360
	1361	Up to this point, this manual has not addressed the issue of @dfn{error
	1362	recovery}---how to continue parsing after the parser detects a syntax
	1363	error. All we have handled is error reporting with @code{yyerror}.
	1364	Recall that by default @code{yyparse} returns after calling
	1365	@code{yyerror}. This means that an erroneous input line causes the
	1366	calculator program to exit. Now we show how to rectify this deficiency.
	1367
	1368	The Bison language itself includes the reserved word @code{error}, which
	1369	may be included in the grammar rules. In the example below it has
	1370	been added to one of the alternatives for @code{line}:
	1371
	1372	@example
	1373	@group
	1374	line: '\n'
	1375	\| exp '\n' @{ printf ("\t%.10g\n", $1); @}
	1376	\| error '\n' @{ yyerrok; @}
	1377	;
	1378	@end group
	1379	@end example
	1380
	1381	This addition to the grammar allows for simple error recovery in the
	1382	event of a parse error. If an expression that cannot be evaluated is
	1383	read, the error will be recognized by the third rule for @code{line},
	1384	and parsing will continue. (The @code{yyerror} function is still called
	1385	upon to print its message as well.) The action executes the statement
	1386	@code{yyerrok}, a macro defined automatically by Bison; its meaning is
	1387	that error recovery is complete (@pxref{Error Recovery}). Note the
	1388	difference between @code{yyerrok} and @code{yyerror}; neither one is a
	1389	misprint.
	1390
	1391	This form of error recovery deals with syntax errors. There are other
	1392	kinds of errors; for example, division by zero, which raises an exception
	1393	signal that is normally fatal. A real calculator program must handle this
	1394	signal and use @code{longjmp} to return to @code{main} and resume parsing
	1395	input lines; it would also have to discard the rest of the current line of
	1396	input. We won't discuss this issue further because it is not specific to
	1397	Bison programs.
	1398
	1399	@node Location Tracking Calc
	1400	@section Location Tracking Calculator: @code{ltcalc}
	1401	@cindex location tracking calculator
	1402	@cindex @code{ltcalc}
	1403	@cindex calculator, location tracking
	1404
	1405	This example extends the infix notation calculator with location
	1406	tracking. This feature will be used to improve the error messages. For
	1407	the sake of clarity, this example is a simple integer calculator, since
	1408	most of the work needed to use locations will be done in the lexical
	1409	analyser.
	1410
	1411	@menu
	1412	* Decls: Ltcalc Decls. Bison and C declarations for ltcalc.
	1413	* Rules: Ltcalc Rules. Grammar rules for ltcalc, with explanations.
	1414	* Lexer: Ltcalc Lexer. The lexical analyzer.
	1415	@end menu
	1416
	1417	@node Ltcalc Decls
	1418	@subsection Declarations for @code{ltcalc}
	1419
	1420	The C and Bison declarations for the location tracking calculator are
	1421	the same as the declarations for the infix notation calculator.
	1422
	1423	@example
	1424	/* Location tracking calculator. */
	1425
	1426	%@{
	1427	#define YYSTYPE int
	1428	#include <math.h>
	1429	%@}
	1430
	1431	/* Bison declarations. */
	1432	%token NUM
	1433
	1434	%left '-' '+'
	1435	%left '*' '/'
	1436	%left NEG
	1437	%right '^'
	1438
	1439	%% /* Grammar follows */
	1440	@end example
	1441
	1442	@noindent
	1443	Note there are no declarations specific to locations. Defining a data
	1444	type for storing locations is not needed: we will use the type provided
	1445	by default (@pxref{Location Type, ,Data Types of Locations}), which is a
	1446	four member structure with the following integer fields:
	1447	@code{first_line}, @code{first_column}, @code{last_line} and
	1448	@code{last_column}.
	1449
	1450	@node Ltcalc Rules
	1451	@subsection Grammar Rules for @code{ltcalc}
	1452
	1453	Whether handling locations or not has no effect on the syntax of your
	1454	language. Therefore, grammar rules for this example will be very close
	1455	to those of the previous example: we will only modify them to benefit
	1456	from the new information.
	1457
	1458	Here, we will use locations to report divisions by zero, and locate the
	1459	wrong expressions or subexpressions.
	1460
	1461	@example
	1462	@group
	1463	input : /* empty */
	1464	\| input line
	1465	;
	1466	@end group
	1467
	1468	@group
	1469	line : '\n'
	1470	\| exp '\n' @{ printf ("%d\n", $1); @}
	1471	;
	1472	@end group
	1473
	1474	@group
	1475	exp : NUM @{ $$ = $1; @}
	1476	\| exp '+' exp @{ $$ = $1 + $3; @}
	1477	\| exp '-' exp @{ $$ = $1 - $3; @}
	1478	\| exp '' exp @{ $$ = $1 $3; @}
	1479	@end group
	1480	@group
	1481	\| exp '/' exp
	1482	@{
	1483	if ($3)
	1484	$$ = $1 / $3;
	1485	else
	1486	@{
	1487	$$ = 1;
	1488	fprintf (stderr, "%d.%d-%d.%d: division by zero",
	1489	@@3.first_line, @@3.first_column,
	1490	@@3.last_line, @@3.last_column);
	1491	@}
	1492	@}
	1493	@end group
	1494	@group
	1495	\| '-' exp %preg NEG @{ $$ = -$2; @}
	1496	\| exp '^' exp @{ $$ = pow ($1, $3); @}
	1497	\| '(' exp ')' @{ $$ = $2; @}
	1498	@end group
	1499	@end example
	1500
	1501	This code shows how to reach locations inside of semantic actions, by
	1502	using the pseudo-variables @code{@@@var{n}} for rule components, and the
	1503	pseudo-variable @code{@@$} for groupings.
	1504
	1505	We don't need to assign a value to @code{@@$}: the output parser does it
	1506	automatically. By default, before executing the C code of each action,
	1507	@code{@@$} is set to range from the beginning of @code{@@1} to the end
	1508	of @code{@@@var{n}}, for a rule with @var{n} components. This behavior
	1509	can be redefined (@pxref{Location Default Action, , Default Action for
	1510	Locations}), and for very specific rules, @code{@@$} can be computed by
	1511	hand.
	1512
	1513	@node Ltcalc Lexer
	1514	@subsection The @code{ltcalc} Lexical Analyzer.
	1515
	1516	Until now, we relied on Bison's defaults to enable location
	1517	tracking. The next step is to rewrite the lexical analyser, and make it
	1518	able to feed the parser with the token locations, as it already does for
	1519	semantic values.
	1520
	1521	To this end, we must take into account every single character of the
	1522	input text, to avoid the computed locations of being fuzzy or wrong:
	1523
	1524	@example
	1525	@group
	1526	int
	1527	yylex (void)
	1528	@{
	1529	int c;
	1530
	1531	/* skip white space */
	1532	while ((c = getchar ()) == ' ' \|\| c == '\t')
	1533	++yylloc.last_column;
	1534
	1535	/* step */
	1536	yylloc.first_line = yylloc.last_line;
	1537	yylloc.first_column = yylloc.last_column;
	1538	@end group
	1539
	1540	@group
	1541	/* process numbers */
	1542	if (isdigit (c))
	1543	@{
	1544	yylval = c - '0';
	1545	++yylloc.last_column;
	1546	while (isdigit (c = getchar ()))
	1547	@{
	1548	++yylloc.last_column;
	1549	yylval = yylval * 10 + c - '0';
	1550	@}
	1551	ungetc (c, stdin);
	1552	return NUM;
	1553	@}
	1554	@end group
	1555
	1556	/* return end-of-file */
	1557	if (c == EOF)
	1558	return 0;
	1559
	1560	/* return single chars and update location */
	1561	if (c == '\n')
	1562	@{
	1563	++yylloc.last_line;
	1564	yylloc.last_column = 0;
	1565	@}
	1566	else
	1567	++yylloc.last_column;
	1568	return c;
	1569	@}
	1570	@end example
	1571
	1572	Basically, the lexical analyzer performs the same processing as before:
	1573	it skips blanks and tabs, and reads numbers or single-character tokens.
	1574	In addition, it updates @code{yylloc}, the global variable (of type
	1575	@code{YYLTYPE}) containing the token's location.
	1576
	1577	Now, each time this function returns a token, the parser has its number
	1578	as well as its semantic value, and its location in the text. The last
	1579	needed change is to initialize @code{yylloc}, for example in the
	1580	controlling function:
	1581
	1582	@example
	1583	@group
	1584	int
	1585	main (void)
	1586	@{
	1587	yylloc.first_line = yylloc.last_line = 1;
	1588	yylloc.first_column = yylloc.last_column = 0;
	1589	return yyparse ();
	1590	@}
	1591	@end group
	1592	@end example
	1593
	1594	Remember that computing locations is not a matter of syntax. Every
	1595	character must be associated to a location update, whether it is in
	1596	valid input, in comments, in literal strings, and so on.
	1597
	1598	@node Multi-function Calc
	1599	@section Multi-Function Calculator: @code{mfcalc}
	1600	@cindex multi-function calculator
	1601	@cindex @code{mfcalc}
	1602	@cindex calculator, multi-function
	1603
	1604	Now that the basics of Bison have been discussed, it is time to move on to
	1605	a more advanced problem. The above calculators provided only five
	1606	functions, @samp{+}, @samp{-}, @samp{*}, @samp{/} and @samp{^}. It would
	1607	be nice to have a calculator that provides other mathematical functions such
	1608	as @code{sin}, @code{cos}, etc.
	1609
	1610	It is easy to add new operators to the infix calculator as long as they are
	1611	only single-character literals. The lexical analyzer @code{yylex} passes
	1612	back all nonnumber characters as tokens, so new grammar rules suffice for
	1613	adding a new operator. But we want something more flexible: built-in
	1614	functions whose syntax has this form:
	1615
	1616	@example
	1617	@var{function_name} (@var{argument})
	1618	@end example
	1619
	1620	@noindent
	1621	At the same time, we will add memory to the calculator, by allowing you
	1622	to create named variables, store values in them, and use them later.
	1623	Here is a sample session with the multi-function calculator:
	1624
	1625	@example
	1626	$ @kbd{mfcalc}
	1627	@kbd{pi = 3.141592653589}
	1628	3.1415926536
	1629	@kbd{sin(pi)}
	1630	0.0000000000
	1631	@kbd{alpha = beta1 = 2.3}
	1632	2.3000000000
	1633	@kbd{alpha}
	1634	2.3000000000
	1635	@kbd{ln(alpha)}
	1636	0.8329091229
	1637	@kbd{exp(ln(beta1))}
	1638	2.3000000000
	1639	$
	1640	@end example
	1641
	1642	Note that multiple assignment and nested function calls are permitted.
	1643
	1644	@menu
	1645	* Decl: Mfcalc Decl. Bison declarations for multi-function calculator.
	1646	* Rules: Mfcalc Rules. Grammar rules for the calculator.
	1647	* Symtab: Mfcalc Symtab. Symbol table management subroutines.
	1648	@end menu
	1649
	1650	@node Mfcalc Decl
	1651	@subsection Declarations for @code{mfcalc}
	1652
	1653	Here are the C and Bison declarations for the multi-function calculator.
	1654
	1655	@smallexample
	1656	%@{
	1657	#include <math.h> /* For math functions, cos(), sin(), etc. */
	1658	#include "calc.h" /* Contains definition of `symrec' */
	1659	%@}
	1660	%union @{
	1661	double val; /* For returning numbers. */
	1662	symrec tptr; / For returning symbol-table pointers */
	1663	@}
	1664
	1665	%token <val> NUM /* Simple double precision number */
	1666	%token <tptr> VAR FNCT /* Variable and Function */
	1667	%type <val> exp
	1668
	1669	%right '='
	1670	%left '-' '+'
	1671	%left '*' '/'
	1672	%left NEG /* Negation--unary minus */
	1673	%right '^' /* Exponentiation */
	1674
	1675	/* Grammar follows */
	1676
	1677	%%
	1678	@end smallexample
	1679
	1680	The above grammar introduces only two new features of the Bison language.
	1681	These features allow semantic values to have various data types
	1682	(@pxref{Multiple Types, ,More Than One Value Type}).
	1683
	1684	The @code{%union} declaration specifies the entire list of possible types;
	1685	this is instead of defining @code{YYSTYPE}. The allowable types are now
	1686	double-floats (for @code{exp} and @code{NUM}) and pointers to entries in
	1687	the symbol table. @xref{Union Decl, ,The Collection of Value Types}.
	1688
	1689	Since values can now have various types, it is necessary to associate a
	1690	type with each grammar symbol whose semantic value is used. These symbols
	1691	are @code{NUM}, @code{VAR}, @code{FNCT}, and @code{exp}. Their
	1692	declarations are augmented with information about their data type (placed
	1693	between angle brackets).
	1694
	1695	The Bison construct @code{%type} is used for declaring nonterminal
	1696	symbols, just as @code{%token} is used for declaring token types. We
	1697	have not used @code{%type} before because nonterminal symbols are
	1698	normally declared implicitly by the rules that define them. But
	1699	@code{exp} must be declared explicitly so we can specify its value type.
	1700	@xref{Type Decl, ,Nonterminal Symbols}.
	1701
	1702	@node Mfcalc Rules
	1703	@subsection Grammar Rules for @code{mfcalc}
	1704
	1705	Here are the grammar rules for the multi-function calculator.
	1706	Most of them are copied directly from @code{calc}; three rules,
	1707	those which mention @code{VAR} or @code{FNCT}, are new.
	1708
	1709	@smallexample
	1710	input: /* empty */
	1711	\| input line
	1712	;
	1713
	1714	line:
	1715	'\n'
	1716	\| exp '\n' @{ printf ("\t%.10g\n", $1); @}
	1717	\| error '\n' @{ yyerrok; @}
	1718	;
	1719
	1720	exp: NUM @{ $$ = $1; @}
	1721	\| VAR @{ $$ = $1->value.var; @}
	1722	\| VAR '=' exp @{ $$ = $3; $1->value.var = $3; @}
	1723	\| FNCT '(' exp ')' @{ $$ = (*($1->value.fnctptr))($3); @}
	1724	\| exp '+' exp @{ $$ = $1 + $3; @}
	1725	\| exp '-' exp @{ $$ = $1 - $3; @}
	1726	\| exp '' exp @{ $$ = $1 $3; @}
	1727	\| exp '/' exp @{ $$ = $1 / $3; @}
	1728	\| '-' exp %prec NEG @{ $$ = -$2; @}
	1729	\| exp '^' exp @{ $$ = pow ($1, $3); @}
	1730	\| '(' exp ')' @{ $$ = $2; @}
	1731	;
	1732	/* End of grammar */
	1733	%%
	1734	@end smallexample
	1735
	1736	@node Mfcalc Symtab
	1737	@subsection The @code{mfcalc} Symbol Table
	1738	@cindex symbol table example
	1739
	1740	The multi-function calculator requires a symbol table to keep track of the
	1741	names and meanings of variables and functions. This doesn't affect the
	1742	grammar rules (except for the actions) or the Bison declarations, but it
	1743	requires some additional C functions for support.
	1744
	1745	The symbol table itself consists of a linked list of records. Its
	1746	definition, which is kept in the header @file{calc.h}, is as follows. It
	1747	provides for either functions or variables to be placed in the table.
	1748
	1749	@smallexample
	1750	@group
	1751	/* Fonctions type. */
	1752	typedef double (*func_t) (double);
	1753
	1754	/* Data type for links in the chain of symbols. */
	1755	struct symrec
	1756	@{
	1757	char name; / name of symbol */
	1758	int type; /* type of symbol: either VAR or FNCT */
	1759	union
	1760	@{
	1761	double var; /* value of a VAR */
	1762	func_t fnctptr; /* value of a FNCT */
	1763	@} value;
	1764	struct symrec next; / link field */
	1765	@};
	1766	@end group
	1767
	1768	@group
	1769	typedef struct symrec symrec;
	1770
	1771	/* The symbol table: a chain of `struct symrec'. */
	1772	extern symrec *sym_table;
	1773
	1774	symrec putsym (const char , func_t);
	1775	symrec getsym (const char );
	1776	@end group
	1777	@end smallexample
	1778
	1779	The new version of @code{main} includes a call to @code{init_table}, a
	1780	function that initializes the symbol table. Here it is, and
	1781	@code{init_table} as well:
	1782
	1783	@smallexample
	1784	@group
	1785	#include <stdio.h>
	1786
	1787	int
	1788	main (void)
	1789	@{
	1790	init_table ();
	1791	return yyparse ();
	1792	@}
	1793	@end group
	1794
	1795	@group
	1796	void
	1797	yyerror (const char s) / Called by yyparse on error */
	1798	@{
	1799	printf ("%s\n", s);
	1800	@}
	1801
	1802	struct init
	1803	@{
	1804	char *fname;
	1805	double (*fnct)(double);
	1806	@};
	1807	@end group
	1808
	1809	@group
	1810	struct init arith_fncts[] =
	1811	@{
	1812	"sin", sin,
	1813	"cos", cos,
	1814	"atan", atan,
	1815	"ln", log,
	1816	"exp", exp,
	1817	"sqrt", sqrt,
	1818	0, 0
	1819	@};
	1820
	1821	/* The symbol table: a chain of `struct symrec'. */
	1822	symrec sym_table = (symrec ) 0;
	1823	@end group
	1824
	1825	@group
	1826	/* Put arithmetic functions in table. */
	1827	void
	1828	init_table (void)
	1829	@{
	1830	int i;
	1831	symrec *ptr;
	1832	for (i = 0; arith_fncts[i].fname != 0; i++)
	1833	@{
	1834	ptr = putsym (arith_fncts[i].fname, FNCT);
	1835	ptr->value.fnctptr = arith_fncts[i].fnct;
	1836	@}
	1837	@}
	1838	@end group
	1839	@end smallexample
	1840
	1841	By simply editing the initialization list and adding the necessary include
	1842	files, you can add additional functions to the calculator.
	1843
	1844	Two important functions allow look-up and installation of symbols in the
	1845	symbol table. The function @code{putsym} is passed a name and the type
	1846	(@code{VAR} or @code{FNCT}) of the object to be installed. The object is
	1847	linked to the front of the list, and a pointer to the object is returned.
	1848	The function @code{getsym} is passed the name of the symbol to look up. If
	1849	found, a pointer to that symbol is returned; otherwise zero is returned.
	1850
	1851	@smallexample
	1852	symrec *
	1853	putsym (char *sym_name, int sym_type)
	1854	@{
	1855	symrec *ptr;
	1856	ptr = (symrec *) malloc (sizeof (symrec));
	1857	ptr->name = (char *) malloc (strlen (sym_name) + 1);
	1858	strcpy (ptr->name,sym_name);
	1859	ptr->type = sym_type;
	1860	ptr->value.var = 0; /* set value to 0 even if fctn. */
	1861	ptr->next = (struct symrec *)sym_table;
	1862	sym_table = ptr;
	1863	return ptr;
	1864	@}
	1865
	1866	symrec *
	1867	getsym (const char *sym_name)
	1868	@{
	1869	symrec *ptr;
	1870	for (ptr = sym_table; ptr != (symrec *) 0;
	1871	ptr = (symrec *)ptr->next)
	1872	if (strcmp (ptr->name,sym_name) == 0)
	1873	return ptr;
	1874	return 0;
	1875	@}
	1876	@end smallexample
	1877
	1878	The function @code{yylex} must now recognize variables, numeric values, and
	1879	the single-character arithmetic operators. Strings of alphanumeric
	1880	characters with a leading non-digit are recognized as either variables or
	1881	functions depending on what the symbol table says about them.
	1882
	1883	The string is passed to @code{getsym} for look up in the symbol table. If
	1884	the name appears in the table, a pointer to its location and its type
	1885	(@code{VAR} or @code{FNCT}) is returned to @code{yyparse}. If it is not
	1886	already in the table, then it is installed as a @code{VAR} using
	1887	@code{putsym}. Again, a pointer and its type (which must be @code{VAR}) is
	1888	returned to @code{yyparse}.
	1889
	1890	No change is needed in the handling of numeric values and arithmetic
	1891	operators in @code{yylex}.
	1892
	1893	@smallexample
	1894	@group
	1895	#include <ctype.h>
	1896
	1897	int
	1898	yylex (void)
	1899	@{
	1900	int c;
	1901
	1902	/* Ignore whitespace, get first nonwhite character. */
	1903	while ((c = getchar ()) == ' ' \|\| c == '\t');
	1904
	1905	if (c == EOF)
	1906	return 0;
	1907	@end group
	1908
	1909	@group
	1910	/* Char starts a number => parse the number. */
	1911	if (c == '.' \|\| isdigit (c))
	1912	@{
	1913	ungetc (c, stdin);
	1914	scanf ("%lf", &yylval.val);
	1915	return NUM;
	1916	@}
	1917	@end group
	1918
	1919	@group
	1920	/* Char starts an identifier => read the name. */
	1921	if (isalpha (c))
	1922	@{
	1923	symrec *s;
	1924	static char *symbuf = 0;
	1925	static int length = 0;
	1926	int i;
	1927	@end group
	1928
	1929	@group
	1930	/* Initially make the buffer long enough
	1931	for a 40-character symbol name. */
	1932	if (length == 0)
	1933	length = 40, symbuf = (char *)malloc (length + 1);
	1934
	1935	i = 0;
	1936	do
	1937	@end group
	1938	@group
	1939	@{
	1940	/* If buffer is full, make it bigger. */
	1941	if (i == length)
	1942	@{
	1943	length *= 2;
	1944	symbuf = (char *)realloc (symbuf, length + 1);
	1945	@}
	1946	/* Add this character to the buffer. */
	1947	symbuf[i++] = c;
	1948	/* Get another character. */
	1949	c = getchar ();
	1950	@}
	1951	@end group
	1952	@group
	1953	while (c != EOF && isalnum (c));
	1954
	1955	ungetc (c, stdin);
	1956	symbuf[i] = '\0';
	1957	@end group
	1958
	1959	@group
	1960	s = getsym (symbuf);
	1961	if (s == 0)
	1962	s = putsym (symbuf, VAR);
	1963	yylval.tptr = s;
	1964	return s->type;
	1965	@}
	1966
	1967	/* Any other character is a token by itself. */
	1968	return c;
	1969	@}
	1970	@end group
	1971	@end smallexample
	1972
	1973	This program is both powerful and flexible. You may easily add new
	1974	functions, and it is a simple job to modify this code to install
	1975	predefined variables such as @code{pi} or @code{e} as well.
	1976
	1977	@node Exercises
	1978	@section Exercises
	1979	@cindex exercises
	1980
	1981	@enumerate
	1982	@item
	1983	Add some new functions from @file{math.h} to the initialization list.
	1984
	1985	@item
	1986	Add another array that contains constants and their values. Then
	1987	modify @code{init_table} to add these constants to the symbol table.
	1988	It will be easiest to give the constants type @code{VAR}.
	1989
	1990	@item
	1991	Make the program report an error if the user refers to an
	1992	uninitialized variable in any way except to store a value in it.
	1993	@end enumerate
	1994
	1995	@node Grammar File
	1996	@chapter Bison Grammar Files
	1997
	1998	Bison takes as input a context-free grammar specification and produces a
	1999	C-language function that recognizes correct instances of the grammar.
	2000
	2001	The Bison grammar input file conventionally has a name ending in @samp{.y}.
	2002	@xref{Invocation, ,Invoking Bison}.
	2003
	2004	@menu
	2005	* Grammar Outline:: Overall layout of the grammar file.
	2006	* Symbols:: Terminal and nonterminal symbols.
	2007	* Rules:: How to write grammar rules.
	2008	* Recursion:: Writing recursive rules.
	2009	* Semantics:: Semantic values and actions.
	2010	* Locations:: Locations and actions.
	2011	* Declarations:: All kinds of Bison declarations are described here.
	2012	* Multiple Parsers:: Putting more than one Bison parser in one program.
	2013	@end menu
	2014
	2015	@node Grammar Outline
	2016	@section Outline of a Bison Grammar
	2017
	2018	A Bison grammar file has four main sections, shown here with the
	2019	appropriate delimiters:
	2020
	2021	@example
	2022	%@{
	2023	@var{Prologue}
	2024	%@}
	2025
	2026	@var{Bison declarations}
	2027
	2028	%%
	2029	@var{Grammar rules}
	2030	%%
	2031
	2032	@var{Epilogue}
	2033	@end example
	2034
	2035	Comments enclosed in @samp{/* @dots{} */} may appear in any of the sections.
	2036
	2037	@menu
	2038	* Prologue:: Syntax and usage of the prologue.
	2039	* Bison Declarations:: Syntax and usage of the Bison declarations section.
	2040	* Grammar Rules:: Syntax and usage of the grammar rules section.
	2041	* Epilogue:: Syntax and usage of the epilogue.
	2042	@end menu
	2043
	2044	@node Prologue, Bison Declarations, , Grammar Outline
	2045	@subsection The prologue
	2046	@cindex declarations section
	2047	@cindex Prologue
	2048	@cindex declarations
	2049
	2050	The @var{Prologue} section contains macro definitions and
	2051	declarations of functions and variables that are used in the actions in the
	2052	grammar rules. These are copied to the beginning of the parser file so
	2053	that they precede the definition of @code{yyparse}. You can use
	2054	@samp{#include} to get the declarations from a header file. If you don't
	2055	need any C declarations, you may omit the @samp{%@{} and @samp{%@}}
	2056	delimiters that bracket this section.
	2057
	2058	You may have more than one @var{Prologue} section, intermixed with the
	2059	@var{Bison declarations}. This allows you to have C and Bison
	2060	declarations that refer to each other. For example, the @code{%union}
	2061	declaration may use types defined in a header file, and you may wish to
	2062	prototype functions that take arguments of type @code{YYSTYPE}. This
	2063	can be done with two @var{Prologue} blocks, one before and one after the
	2064	@code{%union} declaration.
	2065
	2066	@smallexample
	2067	%@{
	2068	#include <stdio.h>
	2069	#include "ptypes.h"
	2070	%@}
	2071
	2072	%union @{
	2073	long n;
	2074	tree t; /* @r{@code{tree} is defined in @file{ptypes.h}.} */
	2075	@}
	2076
	2077	%@{
	2078	static void yyprint(FILE *, int, YYSTYPE);
	2079	#define YYPRINT(F, N, L) yyprint(F, N, L)
	2080	%@}
	2081
	2082	@dots{}
	2083	@end smallexample
	2084
	2085	@node Bison Declarations
	2086	@subsection The Bison Declarations Section
	2087	@cindex Bison declarations (introduction)
	2088	@cindex declarations, Bison (introduction)
	2089
	2090	The @var{Bison declarations} section contains declarations that define
	2091	terminal and nonterminal symbols, specify precedence, and so on.
	2092	In some simple grammars you may not need any declarations.
	2093	@xref{Declarations, ,Bison Declarations}.
	2094
	2095	@node Grammar Rules
	2096	@subsection The Grammar Rules Section
	2097	@cindex grammar rules section
	2098	@cindex rules section for grammar
	2099
	2100	The @dfn{grammar rules} section contains one or more Bison grammar
	2101	rules, and nothing else. @xref{Rules, ,Syntax of Grammar Rules}.
	2102
	2103	There must always be at least one grammar rule, and the first
	2104	@samp{%%} (which precedes the grammar rules) may never be omitted even
	2105	if it is the first thing in the file.
	2106
	2107	@node Epilogue, , Grammar Rules, Grammar Outline
	2108	@subsection The epilogue
	2109	@cindex additional C code section
	2110	@cindex epilogue
	2111	@cindex C code, section for additional
	2112
	2113	The @var{Epilogue} is copied verbatim to the end of the parser file, just as
	2114	the @var{Prologue} is copied to the beginning. This is the most convenient
	2115	place to put anything that you want to have in the parser file but which need
	2116	not come before the definition of @code{yyparse}. For example, the
	2117	definitions of @code{yylex} and @code{yyerror} often go here.
	2118	@xref{Interface, ,Parser C-Language Interface}.
	2119
	2120	If the last section is empty, you may omit the @samp{%%} that separates it
	2121	from the grammar rules.
	2122
	2123	The Bison parser itself contains many static variables whose names start
	2124	with @samp{yy} and many macros whose names start with @samp{YY}. It is a
	2125	good idea to avoid using any such names (except those documented in this
	2126	manual) in the epilogue of the grammar file.
	2127
	2128	@node Symbols
	2129	@section Symbols, Terminal and Nonterminal
	2130	@cindex nonterminal symbol
	2131	@cindex terminal symbol
	2132	@cindex token type
	2133	@cindex symbol
	2134
	2135	@dfn{Symbols} in Bison grammars represent the grammatical classifications
	2136	of the language.
	2137
	2138	A @dfn{terminal symbol} (also known as a @dfn{token type}) represents a
	2139	class of syntactically equivalent tokens. You use the symbol in grammar
	2140	rules to mean that a token in that class is allowed. The symbol is
	2141	represented in the Bison parser by a numeric code, and the @code{yylex}
	2142	function returns a token type code to indicate what kind of token has been
	2143	read. You don't need to know what the code value is; you can use the
	2144	symbol to stand for it.
	2145
	2146	A @dfn{nonterminal symbol} stands for a class of syntactically equivalent
	2147	groupings. The symbol name is used in writing grammar rules. By convention,
	2148	it should be all lower case.
	2149
	2150	Symbol names can contain letters, digits (not at the beginning),
	2151	underscores and periods. Periods make sense only in nonterminals.
	2152
	2153	There are three ways of writing terminal symbols in the grammar:
	2154
	2155	@itemize @bullet
	2156	@item
	2157	A @dfn{named token type} is written with an identifier, like an
	2158	identifier in C. By convention, it should be all upper case. Each
	2159	such name must be defined with a Bison declaration such as
	2160	@code{%token}. @xref{Token Decl, ,Token Type Names}.
	2161
	2162	@item
	2163	@cindex character token
	2164	@cindex literal token
	2165	@cindex single-character literal
	2166	A @dfn{character token type} (or @dfn{literal character token}) is
	2167	written in the grammar using the same syntax used in C for character
	2168	constants; for example, @code{'+'} is a character token type. A
	2169	character token type doesn't need to be declared unless you need to
	2170	specify its semantic value data type (@pxref{Value Type, ,Data Types of
	2171	Semantic Values}), associativity, or precedence (@pxref{Precedence,
	2172	,Operator Precedence}).
	2173
	2174	By convention, a character token type is used only to represent a
	2175	token that consists of that particular character. Thus, the token
	2176	type @code{'+'} is used to represent the character @samp{+} as a
	2177	token. Nothing enforces this convention, but if you depart from it,
	2178	your program will confuse other readers.
	2179
	2180	All the usual escape sequences used in character literals in C can be
	2181	used in Bison as well, but you must not use the null character as a
	2182	character literal because its numeric code, zero, is the code @code{yylex}
	2183	returns for end-of-input (@pxref{Calling Convention, ,Calling Convention
	2184	for @code{yylex}}).
	2185
	2186	@item
	2187	@cindex string token
	2188	@cindex literal string token
	2189	@cindex multicharacter literal
	2190	A @dfn{literal string token} is written like a C string constant; for
	2191	example, @code{"<="} is a literal string token. A literal string token
	2192	doesn't need to be declared unless you need to specify its semantic
	2193	value data type (@pxref{Value Type}), associativity, or precedence
	2194	(@pxref{Precedence}).
	2195
	2196	You can associate the literal string token with a symbolic name as an
	2197	alias, using the @code{%token} declaration (@pxref{Token Decl, ,Token
	2198	Declarations}). If you don't do that, the lexical analyzer has to
	2199	retrieve the token number for the literal string token from the
	2200	@code{yytname} table (@pxref{Calling Convention}).
	2201
	2202	@strong{WARNING}: literal string tokens do not work in Yacc.
	2203
	2204	By convention, a literal string token is used only to represent a token
	2205	that consists of that particular string. Thus, you should use the token
	2206	type @code{"<="} to represent the string @samp{<=} as a token. Bison
	2207	does not enforce this convention, but if you depart from it, people who
	2208	read your program will be confused.
	2209
	2210	All the escape sequences used in string literals in C can be used in
	2211	Bison as well. A literal string token must contain two or more
	2212	characters; for a token containing just one character, use a character
	2213	token (see above).
	2214	@end itemize
	2215
	2216	How you choose to write a terminal symbol has no effect on its
	2217	grammatical meaning. That depends only on where it appears in rules and
	2218	on when the parser function returns that symbol.
	2219
	2220	The value returned by @code{yylex} is always one of the terminal symbols
	2221	(or 0 for end-of-input). Whichever way you write the token type in the
	2222	grammar rules, you write it the same way in the definition of @code{yylex}.
	2223	The numeric code for a character token type is simply the numeric code of
	2224	the character, so @code{yylex} can use the identical character constant to
	2225	generate the requisite code. Each named token type becomes a C macro in
	2226	the parser file, so @code{yylex} can use the name to stand for the code.
	2227	(This is why periods don't make sense in terminal symbols.)
	2228	@xref{Calling Convention, ,Calling Convention for @code{yylex}}.
	2229
	2230	If @code{yylex} is defined in a separate file, you need to arrange for the
	2231	token-type macro definitions to be available there. Use the @samp{-d}
	2232	option when you run Bison, so that it will write these macro definitions
	2233	into a separate header file @file{@var{name}.tab.h} which you can include
	2234	in the other source files that need it. @xref{Invocation, ,Invoking Bison}.
	2235
	2236	The @code{yylex} function must use the same character set and encoding
	2237	that was used by Bison. For example, if you run Bison in an
	2238	@sc{ascii} environment, but then compile and run the resulting program
	2239	in an environment that uses an incompatible character set like
	2240	@sc{ebcdic}, the resulting program will probably not work because the
	2241	tables generated by Bison will assume @sc{ascii} numeric values for
	2242	character tokens. Portable grammars should avoid non-@sc{ascii}
	2243	character tokens, as implementations in practice often use different
	2244	and incompatible extensions in this area. However, it is standard
	2245	practice for software distributions to contain C source files that
	2246	were generated by Bison in an @sc{ascii} environment, so installers on
	2247	platforms that are incompatible with @sc{ascii} must rebuild those
	2248	files before compiling them.
	2249
	2250	The symbol @code{error} is a terminal symbol reserved for error recovery
	2251	(@pxref{Error Recovery}); you shouldn't use it for any other purpose.
	2252	In particular, @code{yylex} should never return this value. The default
	2253	value of the error token is 256, unless you explicitly assigned 256 to
	2254	one of your tokens with a @code{%token} declaration.
	2255
	2256	@node Rules
	2257	@section Syntax of Grammar Rules
	2258	@cindex rule syntax
	2259	@cindex grammar rule syntax
	2260	@cindex syntax of grammar rules
	2261
	2262	A Bison grammar rule has the following general form:
	2263
	2264	@example
	2265	@group
	2266	@var{result}: @var{components}@dots{}
	2267	;
	2268	@end group
	2269	@end example
	2270
	2271	@noindent
	2272	where @var{result} is the nonterminal symbol that this rule describes,
	2273	and @var{components} are various terminal and nonterminal symbols that
	2274	are put together by this rule (@pxref{Symbols}).
	2275
	2276	For example,
	2277
	2278	@example
	2279	@group
	2280	exp: exp '+' exp
	2281	;
	2282	@end group
	2283	@end example
	2284
	2285	@noindent
	2286	says that two groupings of type @code{exp}, with a @samp{+} token in between,
	2287	can be combined into a larger grouping of type @code{exp}.
	2288
	2289	Whitespace in rules is significant only to separate symbols. You can add
	2290	extra whitespace as you wish.
	2291
	2292	Scattered among the components can be @var{actions} that determine
	2293	the semantics of the rule. An action looks like this:
	2294
	2295	@example
	2296	@{@var{C statements}@}
	2297	@end example
	2298
	2299	@noindent
	2300	Usually there is only one action and it follows the components.
	2301	@xref{Actions}.
	2302
	2303	@findex \|
	2304	Multiple rules for the same @var{result} can be written separately or can
	2305	be joined with the vertical-bar character @samp{\|} as follows:
	2306
	2307	@ifinfo
	2308	@example
	2309	@var{result}: @var{rule1-components}@dots{}
	2310	\| @var{rule2-components}@dots{}
	2311	@dots{}
	2312	;
	2313	@end example
	2314	@end ifinfo
	2315	@iftex
	2316	@example
	2317	@group
	2318	@var{result}: @var{rule1-components}@dots{}
	2319	\| @var{rule2-components}@dots{}
	2320	@dots{}
	2321	;
	2322	@end group
	2323	@end example
	2324	@end iftex
	2325
	2326	@noindent
	2327	They are still considered distinct rules even when joined in this way.
	2328
	2329	If @var{components} in a rule is empty, it means that @var{result} can
	2330	match the empty string. For example, here is how to define a
	2331	comma-separated sequence of zero or more @code{exp} groupings:
	2332
	2333	@example
	2334	@group
	2335	expseq: /* empty */
	2336	\| expseq1
	2337	;
	2338	@end group
	2339
	2340	@group
	2341	expseq1: exp
	2342	\| expseq1 ',' exp
	2343	;
	2344	@end group
	2345	@end example
	2346
	2347	@noindent
	2348	It is customary to write a comment @samp{/* empty */} in each rule
	2349	with no components.
	2350
	2351	@node Recursion
	2352	@section Recursive Rules
	2353	@cindex recursive rule
	2354
	2355	A rule is called @dfn{recursive} when its @var{result} nonterminal appears
	2356	also on its right hand side. Nearly all Bison grammars need to use
	2357	recursion, because that is the only way to define a sequence of any number
	2358	of a particular thing. Consider this recursive definition of a
	2359	comma-separated sequence of one or more expressions:
	2360
	2361	@example
	2362	@group
	2363	expseq1: exp
	2364	\| expseq1 ',' exp
	2365	;
	2366	@end group
	2367	@end example
	2368
	2369	@cindex left recursion
	2370	@cindex right recursion
	2371	@noindent
	2372	Since the recursive use of @code{expseq1} is the leftmost symbol in the
	2373	right hand side, we call this @dfn{left recursion}. By contrast, here
	2374	the same construct is defined using @dfn{right recursion}:
	2375
	2376	@example
	2377	@group
	2378	expseq1: exp
	2379	\| exp ',' expseq1
	2380	;
	2381	@end group
	2382	@end example
	2383
	2384	@noindent
	2385	Any kind of sequence can be defined using either left recursion or right
	2386	recursion, but you should always use left recursion, because it can
	2387	parse a sequence of any number of elements with bounded stack space.
	2388	Right recursion uses up space on the Bison stack in proportion to the
	2389	number of elements in the sequence, because all the elements must be
	2390	shifted onto the stack before the rule can be applied even once.
	2391	@xref{Algorithm, ,The Bison Parser Algorithm}, for further explanation
	2392	of this.
	2393
	2394	@cindex mutual recursion
	2395	@dfn{Indirect} or @dfn{mutual} recursion occurs when the result of the
	2396	rule does not appear directly on its right hand side, but does appear
	2397	in rules for other nonterminals which do appear on its right hand
	2398	side.
	2399
	2400	For example:
	2401
	2402	@example
	2403	@group
	2404	expr: primary
	2405	\| primary '+' primary
	2406	;
	2407	@end group
	2408
	2409	@group
	2410	primary: constant
	2411	\| '(' expr ')'
	2412	;
	2413	@end group
	2414	@end example
	2415
	2416	@noindent
	2417	defines two mutually-recursive nonterminals, since each refers to the
	2418	other.
	2419
	2420	@node Semantics
	2421	@section Defining Language Semantics
	2422	@cindex defining language semantics
	2423	@cindex language semantics, defining
	2424
	2425	The grammar rules for a language determine only the syntax. The semantics
	2426	are determined by the semantic values associated with various tokens and
	2427	groupings, and by the actions taken when various groupings are recognized.
	2428
	2429	For example, the calculator calculates properly because the value
	2430	associated with each expression is the proper number; it adds properly
	2431	because the action for the grouping @w{@samp{@var{x} + @var{y}}} is to add
	2432	the numbers associated with @var{x} and @var{y}.
	2433
	2434	@menu
	2435	* Value Type:: Specifying one data type for all semantic values.
	2436	* Multiple Types:: Specifying several alternative data types.
	2437	* Actions:: An action is the semantic definition of a grammar rule.
	2438	* Action Types:: Specifying data types for actions to operate on.
	2439	* Mid-Rule Actions:: Most actions go at the end of a rule.
	2440	This says when, why and how to use the exceptional
	2441	action in the middle of a rule.
	2442	@end menu
	2443
	2444	@node Value Type
	2445	@subsection Data Types of Semantic Values
	2446	@cindex semantic value type
	2447	@cindex value type, semantic
	2448	@cindex data types of semantic values
	2449	@cindex default data type
	2450
	2451	In a simple program it may be sufficient to use the same data type for
	2452	the semantic values of all language constructs. This was true in the
	2453	RPN and infix calculator examples (@pxref{RPN Calc, ,Reverse Polish
	2454	Notation Calculator}).
	2455
	2456	Bison's default is to use type @code{int} for all semantic values. To
	2457	specify some other type, define @code{YYSTYPE} as a macro, like this:
	2458
	2459	@example
	2460	#define YYSTYPE double
	2461	@end example
	2462
	2463	@noindent
	2464	This macro definition must go in the prologue of the grammar file
	2465	(@pxref{Grammar Outline, ,Outline of a Bison Grammar}).
	2466
	2467	@node Multiple Types
	2468	@subsection More Than One Value Type
	2469
	2470	In most programs, you will need different data types for different kinds
	2471	of tokens and groupings. For example, a numeric constant may need type
	2472	@code{int} or @code{long}, while a string constant needs type @code{char *},
	2473	and an identifier might need a pointer to an entry in the symbol table.
	2474
	2475	To use more than one data type for semantic values in one parser, Bison
	2476	requires you to do two things:
	2477
	2478	@itemize @bullet
	2479	@item
	2480	Specify the entire collection of possible data types, with the
	2481	@code{%union} Bison declaration (@pxref{Union Decl, ,The Collection of
	2482	Value Types}).
	2483
	2484	@item
	2485	Choose one of those types for each symbol (terminal or nonterminal) for
	2486	which semantic values are used. This is done for tokens with the
	2487	@code{%token} Bison declaration (@pxref{Token Decl, ,Token Type Names})
	2488	and for groupings with the @code{%type} Bison declaration (@pxref{Type
	2489	Decl, ,Nonterminal Symbols}).
	2490	@end itemize
	2491
	2492	@node Actions
	2493	@subsection Actions
	2494	@cindex action
	2495	@vindex $$
	2496	@vindex $@var{n}
	2497
	2498	An action accompanies a syntactic rule and contains C code to be executed
	2499	each time an instance of that rule is recognized. The task of most actions
	2500	is to compute a semantic value for the grouping built by the rule from the
	2501	semantic values associated with tokens or smaller groupings.
	2502
	2503	An action consists of C statements surrounded by braces, much like a
	2504	compound statement in C. It can be placed at any position in the rule;
	2505	it is executed at that position. Most rules have just one action at the
	2506	end of the rule, following all the components. Actions in the middle of
	2507	a rule are tricky and used only for special purposes (@pxref{Mid-Rule
	2508	Actions, ,Actions in Mid-Rule}).
	2509
	2510	The C code in an action can refer to the semantic values of the components
	2511	matched by the rule with the construct @code{$@var{n}}, which stands for
	2512	the value of the @var{n}th component. The semantic value for the grouping
	2513	being constructed is @code{$$}. (Bison translates both of these constructs
	2514	into array element references when it copies the actions into the parser
	2515	file.)
	2516
	2517	Here is a typical example:
	2518
	2519	@example
	2520	@group
	2521	exp: @dots{}
	2522	\| exp '+' exp
	2523	@{ $$ = $1 + $3; @}
	2524	@end group
	2525	@end example
	2526
	2527	@noindent
	2528	This rule constructs an @code{exp} from two smaller @code{exp} groupings
	2529	connected by a plus-sign token. In the action, @code{$1} and @code{$3}
	2530	refer to the semantic values of the two component @code{exp} groupings,
	2531	which are the first and third symbols on the right hand side of the rule.
	2532	The sum is stored into @code{$$} so that it becomes the semantic value of
	2533	the addition-expression just recognized by the rule. If there were a
	2534	useful semantic value associated with the @samp{+} token, it could be
	2535	referred to as @code{$2}.
	2536
	2537	Note that the vertical-bar character @samp{\|} is really a rule
	2538	separator, and actions are attached to a single rule. This is a
	2539	difference with tools like Flex, for which @samp{\|} stands for either
	2540	``or'', or ``the same action as that of the next rule''. In the
	2541	following example, the action is triggered only when @samp{b} is found:
	2542
	2543	@example
	2544	@group
	2545	a-or-b: 'a'\|'b' @{ a_or_b_found = 1; @};
	2546	@end group
	2547	@end example
	2548
	2549	@cindex default action
	2550	If you don't specify an action for a rule, Bison supplies a default:
	2551	@w{@code{$$ = $1}.} Thus, the value of the first symbol in the rule becomes
	2552	the value of the whole rule. Of course, the default rule is valid only
	2553	if the two data types match. There is no meaningful default action for
	2554	an empty rule; every empty rule must have an explicit action unless the
	2555	rule's value does not matter.
	2556
	2557	@code{$@var{n}} with @var{n} zero or negative is allowed for reference
	2558	to tokens and groupings on the stack @emph{before} those that match the
	2559	current rule. This is a very risky practice, and to use it reliably
	2560	you must be certain of the context in which the rule is applied. Here
	2561	is a case in which you can use this reliably:
	2562
	2563	@example
	2564	@group
	2565	foo: expr bar '+' expr @{ @dots{} @}
	2566	\| expr bar '-' expr @{ @dots{} @}
	2567	;
	2568	@end group
	2569
	2570	@group
	2571	bar: /* empty */
	2572	@{ previous_expr = $0; @}
	2573	;
	2574	@end group
	2575	@end example
	2576
	2577	As long as @code{bar} is used only in the fashion shown here, @code{$0}
	2578	always refers to the @code{expr} which precedes @code{bar} in the
	2579	definition of @code{foo}.
	2580
	2581	@node Action Types
	2582	@subsection Data Types of Values in Actions
	2583	@cindex action data types
	2584	@cindex data types in actions
	2585
	2586	If you have chosen a single data type for semantic values, the @code{$$}
	2587	and @code{$@var{n}} constructs always have that data type.
	2588
	2589	If you have used @code{%union} to specify a variety of data types, then you
	2590	must declare a choice among these types for each terminal or nonterminal
	2591	symbol that can have a semantic value. Then each time you use @code{$$} or
	2592	@code{$@var{n}}, its data type is determined by which symbol it refers to
	2593	in the rule. In this example,
	2594
	2595	@example
	2596	@group
	2597	exp: @dots{}
	2598	\| exp '+' exp
	2599	@{ $$ = $1 + $3; @}
	2600	@end group
	2601	@end example
	2602
	2603	@noindent
	2604	@code{$1} and @code{$3} refer to instances of @code{exp}, so they all
	2605	have the data type declared for the nonterminal symbol @code{exp}. If
	2606	@code{$2} were used, it would have the data type declared for the
	2607	terminal symbol @code{'+'}, whatever that might be.
	2608
	2609	Alternatively, you can specify the data type when you refer to the value,
	2610	by inserting @samp{<@var{type}>} after the @samp{$} at the beginning of the
	2611	reference. For example, if you have defined types as shown here:
	2612
	2613	@example
	2614	@group
	2615	%union @{
	2616	int itype;
	2617	double dtype;
	2618	@}
	2619	@end group
	2620	@end example
	2621
	2622	@noindent
	2623	then you can write @code{$<itype>1} to refer to the first subunit of the
	2624	rule as an integer, or @code{$<dtype>1} to refer to it as a double.
	2625
	2626	@node Mid-Rule Actions
	2627	@subsection Actions in Mid-Rule
	2628	@cindex actions in mid-rule
	2629	@cindex mid-rule actions
	2630
	2631	Occasionally it is useful to put an action in the middle of a rule.
	2632	These actions are written just like usual end-of-rule actions, but they
	2633	are executed before the parser even recognizes the following components.
	2634
	2635	A mid-rule action may refer to the components preceding it using
	2636	@code{$@var{n}}, but it may not refer to subsequent components because
	2637	it is run before they are parsed.
	2638
	2639	The mid-rule action itself counts as one of the components of the rule.
	2640	This makes a difference when there is another action later in the same rule
	2641	(and usually there is another at the end): you have to count the actions
	2642	along with the symbols when working out which number @var{n} to use in
	2643	@code{$@var{n}}.
	2644
	2645	The mid-rule action can also have a semantic value. The action can set
	2646	its value with an assignment to @code{$$}, and actions later in the rule
	2647	can refer to the value using @code{$@var{n}}. Since there is no symbol
	2648	to name the action, there is no way to declare a data type for the value
	2649	in advance, so you must use the @samp{$<@dots{}>@var{n}} construct to
	2650	specify a data type each time you refer to this value.
	2651
	2652	There is no way to set the value of the entire rule with a mid-rule
	2653	action, because assignments to @code{$$} do not have that effect. The
	2654	only way to set the value for the entire rule is with an ordinary action
	2655	at the end of the rule.
	2656
	2657	Here is an example from a hypothetical compiler, handling a @code{let}
	2658	statement that looks like @samp{let (@var{variable}) @var{statement}} and
	2659	serves to create a variable named @var{variable} temporarily for the
	2660	duration of @var{statement}. To parse this construct, we must put
	2661	@var{variable} into the symbol table while @var{statement} is parsed, then
	2662	remove it afterward. Here is how it is done:
	2663
	2664	@example
	2665	@group
	2666	stmt: LET '(' var ')'
	2667	@{ $<context>$ = push_context ();
	2668	declare_variable ($3); @}
	2669	stmt @{ $$ = $6;
	2670	pop_context ($<context>5); @}
	2671	@end group
	2672	@end example
	2673
	2674	@noindent
	2675	As soon as @samp{let (@var{variable})} has been recognized, the first
	2676	action is run. It saves a copy of the current semantic context (the
	2677	list of accessible variables) as its semantic value, using alternative
	2678	@code{context} in the data-type union. Then it calls
	2679	@code{declare_variable} to add the new variable to that list. Once the
	2680	first action is finished, the embedded statement @code{stmt} can be
	2681	parsed. Note that the mid-rule action is component number 5, so the
	2682	@samp{stmt} is component number 6.
	2683
	2684	After the embedded statement is parsed, its semantic value becomes the
	2685	value of the entire @code{let}-statement. Then the semantic value from the
	2686	earlier action is used to restore the prior list of variables. This
	2687	removes the temporary @code{let}-variable from the list so that it won't
	2688	appear to exist while the rest of the program is parsed.
	2689
	2690	Taking action before a rule is completely recognized often leads to
	2691	conflicts since the parser must commit to a parse in order to execute the
	2692	action. For example, the following two rules, without mid-rule actions,
	2693	can coexist in a working parser because the parser can shift the open-brace
	2694	token and look at what follows before deciding whether there is a
	2695	declaration or not:
	2696
	2697	@example
	2698	@group
	2699	compound: '@{' declarations statements '@}'
	2700	\| '@{' statements '@}'
	2701	;
	2702	@end group
	2703	@end example
	2704
	2705	@noindent
	2706	But when we add a mid-rule action as follows, the rules become nonfunctional:
	2707
	2708	@example
	2709	@group
	2710	compound: @{ prepare_for_local_variables (); @}
	2711	'@{' declarations statements '@}'
	2712	@end group
	2713	@group
	2714	\| '@{' statements '@}'
	2715	;
	2716	@end group
	2717	@end example
	2718
	2719	@noindent
	2720	Now the parser is forced to decide whether to run the mid-rule action
	2721	when it has read no farther than the open-brace. In other words, it
	2722	must commit to using one rule or the other, without sufficient
	2723	information to do it correctly. (The open-brace token is what is called
	2724	the @dfn{look-ahead} token at this time, since the parser is still
	2725	deciding what to do about it. @xref{Look-Ahead, ,Look-Ahead Tokens}.)
	2726
	2727	You might think that you could correct the problem by putting identical
	2728	actions into the two rules, like this:
	2729
	2730	@example
	2731	@group
	2732	compound: @{ prepare_for_local_variables (); @}
	2733	'@{' declarations statements '@}'
	2734	\| @{ prepare_for_local_variables (); @}
	2735	'@{' statements '@}'
	2736	;
	2737	@end group
	2738	@end example
	2739
	2740	@noindent
	2741	But this does not help, because Bison does not realize that the two actions
	2742	are identical. (Bison never tries to understand the C code in an action.)
	2743
	2744	If the grammar is such that a declaration can be distinguished from a
	2745	statement by the first token (which is true in C), then one solution which
	2746	does work is to put the action after the open-brace, like this:
	2747
	2748	@example
	2749	@group
	2750	compound: '@{' @{ prepare_for_local_variables (); @}
	2751	declarations statements '@}'
	2752	\| '@{' statements '@}'
	2753	;
	2754	@end group
	2755	@end example
	2756
	2757	@noindent
	2758	Now the first token of the following declaration or statement,
	2759	which would in any case tell Bison which rule to use, can still do so.
	2760
	2761	Another solution is to bury the action inside a nonterminal symbol which
	2762	serves as a subroutine:
	2763
	2764	@example
	2765	@group
	2766	subroutine: /* empty */
	2767	@{ prepare_for_local_variables (); @}
	2768	;
	2769
	2770	@end group
	2771
	2772	@group
	2773	compound: subroutine
	2774	'@{' declarations statements '@}'
	2775	\| subroutine
	2776	'@{' statements '@}'
	2777	;
	2778	@end group
	2779	@end example
	2780
	2781	@noindent
	2782	Now Bison can execute the action in the rule for @code{subroutine} without
	2783	deciding which rule for @code{compound} it will eventually use. Note that
	2784	the action is now at the end of its rule. Any mid-rule action can be
	2785	converted to an end-of-rule action in this way, and this is what Bison
	2786	actually does to implement mid-rule actions.
	2787
	2788	@node Locations
	2789	@section Tracking Locations
	2790	@cindex location
	2791	@cindex textual position
	2792	@cindex position, textual
	2793
	2794	Though grammar rules and semantic actions are enough to write a fully
	2795	functional parser, it can be useful to process some additionnal informations,
	2796	especially symbol locations.
	2797
	2798	@c (terminal or not) ?
	2799
	2800	The way locations are handled is defined by providing a data type, and
	2801	actions to take when rules are matched.
	2802
	2803	@menu
	2804	* Location Type:: Specifying a data type for locations.
	2805	* Actions and Locations:: Using locations in actions.
	2806	* Location Default Action:: Defining a general way to compute locations.
	2807	@end menu
	2808
	2809	@node Location Type
	2810	@subsection Data Type of Locations
	2811	@cindex data type of locations
	2812	@cindex default location type
	2813
	2814	Defining a data type for locations is much simpler than for semantic values,
	2815	since all tokens and groupings always use the same type.
	2816
	2817	The type of locations is specified by defining a macro called @code{YYLTYPE}.
	2818	When @code{YYLTYPE} is not defined, Bison uses a default structure type with
	2819	four members:
	2820
	2821	@example
	2822	struct
	2823	@{
	2824	int first_line;
	2825	int first_column;
	2826	int last_line;
	2827	int last_column;
	2828	@}
	2829	@end example
	2830
	2831	@node Actions and Locations
	2832	@subsection Actions and Locations
	2833	@cindex location actions
	2834	@cindex actions, location
	2835	@vindex @@$
	2836	@vindex @@@var{n}
	2837
	2838	Actions are not only useful for defining language semantics, but also for
	2839	describing the behavior of the output parser with locations.
	2840
	2841	The most obvious way for building locations of syntactic groupings is very
	2842	similar to the way semantic values are computed. In a given rule, several
	2843	constructs can be used to access the locations of the elements being matched.
	2844	The location of the @var{n}th component of the right hand side is
	2845	@code{@@@var{n}}, while the location of the left hand side grouping is
	2846	@code{@@$}.
	2847
	2848	Here is a basic example using the default data type for locations:
	2849
	2850	@example
	2851	@group
	2852	exp: @dots{}
	2853	\| exp '/' exp
	2854	@{
	2855	@@$.first_column = @@1.first_column;
	2856	@@$.first_line = @@1.first_line;
	2857	@@$.last_column = @@3.last_column;
	2858	@@$.last_line = @@3.last_line;
	2859	if ($3)
	2860	$$ = $1 / $3;
	2861	else
	2862	@{
	2863	$$ = 1;
	2864	printf("Division by zero, l%d,c%d-l%d,c%d",
	2865	@@3.first_line, @@3.first_column,
	2866	@@3.last_line, @@3.last_column);
	2867	@}
	2868	@}
	2869	@end group
	2870	@end example
	2871
	2872	As for semantic values, there is a default action for locations that is
	2873	run each time a rule is matched. It sets the beginning of @code{@@$} to the
	2874	beginning of the first symbol, and the end of @code{@@$} to the end of the
	2875	last symbol.
	2876
	2877	With this default action, the location tracking can be fully automatic. The
	2878	example above simply rewrites this way:
	2879
	2880	@example
	2881	@group
	2882	exp: @dots{}
	2883	\| exp '/' exp
	2884	@{
	2885	if ($3)
	2886	$$ = $1 / $3;
	2887	else
	2888	@{
	2889	$$ = 1;
	2890	printf("Division by zero, l%d,c%d-l%d,c%d",
	2891	@@3.first_line, @@3.first_column,
	2892	@@3.last_line, @@3.last_column);
	2893	@}
	2894	@}
	2895	@end group
	2896	@end example
	2897
	2898	@node Location Default Action
	2899	@subsection Default Action for Locations
	2900	@vindex YYLLOC_DEFAULT
	2901
	2902	Actually, actions are not the best place to compute locations. Since
	2903	locations are much more general than semantic values, there is room in
	2904	the output parser to redefine the default action to take for each
	2905	rule. The @code{YYLLOC_DEFAULT} macro is invoked each time a rule is
	2906	matched, before the associated action is run.
	2907
	2908	Most of the time, this macro is general enough to suppress location
	2909	dedicated code from semantic actions.
	2910
	2911	The @code{YYLLOC_DEFAULT} macro takes three parameters. The first one is
	2912	the location of the grouping (the result of the computation). The second one
	2913	is an array holding locations of all right hand side elements of the rule
	2914	being matched. The last one is the size of the right hand side rule.
	2915
	2916	By default, it is defined this way:
	2917
	2918	@example
	2919	@group
	2920	#define YYLLOC_DEFAULT(Current, Rhs, N) \
	2921	Current.first_line = Rhs[1].first_line; \
	2922	Current.first_column = Rhs[1].first_column; \
	2923	Current.last_line = Rhs[N].last_line; \
	2924	Current.last_column = Rhs[N].last_column;
	2925	@end group
	2926	@end example
	2927
	2928	When defining @code{YYLLOC_DEFAULT}, you should consider that:
	2929
	2930	@itemize @bullet
	2931	@item
	2932	All arguments are free of side-effects. However, only the first one (the
	2933	result) should be modified by @code{YYLLOC_DEFAULT}.
	2934
	2935	@item
	2936	For consistency with semantic actions, valid indexes for the location
	2937	array range from 1 to @var{n}.
	2938	@end itemize
	2939
	2940	@node Declarations
	2941	@section Bison Declarations
	2942	@cindex declarations, Bison
	2943	@cindex Bison declarations
	2944
	2945	The @dfn{Bison declarations} section of a Bison grammar defines the symbols
	2946	used in formulating the grammar and the data types of semantic values.
	2947	@xref{Symbols}.
	2948
	2949	All token type names (but not single-character literal tokens such as
	2950	@code{'+'} and @code{'*'}) must be declared. Nonterminal symbols must be
	2951	declared if you need to specify which data type to use for the semantic
	2952	value (@pxref{Multiple Types, ,More Than One Value Type}).
	2953
	2954	The first rule in the file also specifies the start symbol, by default.
	2955	If you want some other symbol to be the start symbol, you must declare
	2956	it explicitly (@pxref{Language and Grammar, ,Languages and Context-Free
	2957	Grammars}).
	2958
	2959	@menu
	2960	* Token Decl:: Declaring terminal symbols.
	2961	* Precedence Decl:: Declaring terminals with precedence and associativity.
	2962	* Union Decl:: Declaring the set of all semantic value types.
	2963	* Type Decl:: Declaring the choice of type for a nonterminal symbol.
	2964	* Expect Decl:: Suppressing warnings about shift/reduce conflicts.
	2965	* Start Decl:: Specifying the start symbol.
	2966	* Pure Decl:: Requesting a reentrant parser.
	2967	* Decl Summary:: Table of all Bison declarations.
	2968	@end menu
	2969
	2970	@node Token Decl
	2971	@subsection Token Type Names
	2972	@cindex declaring token type names
	2973	@cindex token type names, declaring
	2974	@cindex declaring literal string tokens
	2975	@findex %token
	2976
	2977	The basic way to declare a token type name (terminal symbol) is as follows:
	2978
	2979	@example
	2980	%token @var{name}
	2981	@end example
	2982
	2983	Bison will convert this into a @code{#define} directive in
	2984	the parser, so that the function @code{yylex} (if it is in this file)
	2985	can use the name @var{name} to stand for this token type's code.
	2986
	2987	Alternatively, you can use @code{%left}, @code{%right}, or
	2988	@code{%nonassoc} instead of @code{%token}, if you wish to specify
	2989	associativity and precedence. @xref{Precedence Decl, ,Operator
	2990	Precedence}.
	2991
	2992	You can explicitly specify the numeric code for a token type by appending
	2993	an integer value in the field immediately following the token name:
	2994
	2995	@example
	2996	%token NUM 300
	2997	@end example
	2998
	2999	@noindent
	3000	It is generally best, however, to let Bison choose the numeric codes for
	3001	all token types. Bison will automatically select codes that don't conflict
	3002	with each other or with normal characters.
	3003
	3004	In the event that the stack type is a union, you must augment the
	3005	@code{%token} or other token declaration to include the data type
	3006	alternative delimited by angle-brackets (@pxref{Multiple Types, ,More
	3007	Than One Value Type}).
	3008
	3009	For example:
	3010
	3011	@example
	3012	@group
	3013	%union @{ /* define stack type */
	3014	double val;
	3015	symrec *tptr;
	3016	@}
	3017	%token <val> NUM /* define token NUM and its type */
	3018	@end group
	3019	@end example
	3020
	3021	You can associate a literal string token with a token type name by
	3022	writing the literal string at the end of a @code{%token}
	3023	declaration which declares the name. For example:
	3024
	3025	@example
	3026	%token arrow "=>"
	3027	@end example
	3028
	3029	@noindent
	3030	For example, a grammar for the C language might specify these names with
	3031	equivalent literal string tokens:
	3032
	3033	@example
	3034	%token <operator> OR "\|\|"
	3035	%token <operator> LE 134 "<="
	3036	%left OR "<="
	3037	@end example
	3038
	3039	@noindent
	3040	Once you equate the literal string and the token name, you can use them
	3041	interchangeably in further declarations or the grammar rules. The
	3042	@code{yylex} function can use the token name or the literal string to
	3043	obtain the token type code number (@pxref{Calling Convention}).
	3044
	3045	@node Precedence Decl
	3046	@subsection Operator Precedence
	3047	@cindex precedence declarations
	3048	@cindex declaring operator precedence
	3049	@cindex operator precedence, declaring
	3050
	3051	Use the @code{%left}, @code{%right} or @code{%nonassoc} declaration to
	3052	declare a token and specify its precedence and associativity, all at
	3053	once. These are called @dfn{precedence declarations}.
	3054	@xref{Precedence, ,Operator Precedence}, for general information on
	3055	operator precedence.
	3056
	3057	The syntax of a precedence declaration is the same as that of
	3058	@code{%token}: either
	3059
	3060	@example
	3061	%left @var{symbols}@dots{}
	3062	@end example
	3063
	3064	@noindent
	3065	or
	3066
	3067	@example
	3068	%left <@var{type}> @var{symbols}@dots{}
	3069	@end example
	3070
	3071	And indeed any of these declarations serves the purposes of @code{%token}.
	3072	But in addition, they specify the associativity and relative precedence for
	3073	all the @var{symbols}:
	3074
	3075	@itemize @bullet
	3076	@item
	3077	The associativity of an operator @var{op} determines how repeated uses
	3078	of the operator nest: whether @samp{@var{x} @var{op} @var{y} @var{op}
	3079	@var{z}} is parsed by grouping @var{x} with @var{y} first or by
	3080	grouping @var{y} with @var{z} first. @code{%left} specifies
	3081	left-associativity (grouping @var{x} with @var{y} first) and
	3082	@code{%right} specifies right-associativity (grouping @var{y} with
	3083	@var{z} first). @code{%nonassoc} specifies no associativity, which
	3084	means that @samp{@var{x} @var{op} @var{y} @var{op} @var{z}} is
	3085	considered a syntax error.
	3086
	3087	@item
	3088	The precedence of an operator determines how it nests with other operators.
	3089	All the tokens declared in a single precedence declaration have equal
	3090	precedence and nest together according to their associativity.
	3091	When two tokens declared in different precedence declarations associate,
	3092	the one declared later has the higher precedence and is grouped first.
	3093	@end itemize
	3094
	3095	@node Union Decl
	3096	@subsection The Collection of Value Types
	3097	@cindex declaring value types
	3098	@cindex value types, declaring
	3099	@findex %union
	3100
	3101	The @code{%union} declaration specifies the entire collection of possible
	3102	data types for semantic values. The keyword @code{%union} is followed by a
	3103	pair of braces containing the same thing that goes inside a @code{union} in
	3104	C.
	3105
	3106	For example:
	3107
	3108	@example
	3109	@group
	3110	%union @{
	3111	double val;
	3112	symrec *tptr;
	3113	@}
	3114	@end group
	3115	@end example
	3116
	3117	@noindent
	3118	This says that the two alternative types are @code{double} and @code{symrec
	3119	*}. They are given names @code{val} and @code{tptr}; these names are used
	3120	in the @code{%token} and @code{%type} declarations to pick one of the types
	3121	for a terminal or nonterminal symbol (@pxref{Type Decl, ,Nonterminal Symbols}).
	3122
	3123	Note that, unlike making a @code{union} declaration in C, you do not write
	3124	a semicolon after the closing brace.
	3125
	3126	@node Type Decl
	3127	@subsection Nonterminal Symbols
	3128	@cindex declaring value types, nonterminals
	3129	@cindex value types, nonterminals, declaring
	3130	@findex %type
	3131
	3132	@noindent
	3133	When you use @code{%union} to specify multiple value types, you must
	3134	declare the value type of each nonterminal symbol for which values are
	3135	used. This is done with a @code{%type} declaration, like this:
	3136
	3137	@example
	3138	%type <@var{type}> @var{nonterminal}@dots{}
	3139	@end example
	3140
	3141	@noindent
	3142	Here @var{nonterminal} is the name of a nonterminal symbol, and
	3143	@var{type} is the name given in the @code{%union} to the alternative
	3144	that you want (@pxref{Union Decl, ,The Collection of Value Types}). You
	3145	can give any number of nonterminal symbols in the same @code{%type}
	3146	declaration, if they have the same value type. Use spaces to separate
	3147	the symbol names.
	3148
	3149	You can also declare the value type of a terminal symbol. To do this,
	3150	use the same @code{<@var{type}>} construction in a declaration for the
	3151	terminal symbol. All kinds of token declarations allow
	3152	@code{<@var{type}>}.
	3153
	3154	@node Expect Decl
	3155	@subsection Suppressing Conflict Warnings
	3156	@cindex suppressing conflict warnings
	3157	@cindex preventing warnings about conflicts
	3158	@cindex warnings, preventing
	3159	@cindex conflicts, suppressing warnings of
	3160	@findex %expect
	3161
	3162	Bison normally warns if there are any conflicts in the grammar
	3163	(@pxref{Shift/Reduce, ,Shift/Reduce Conflicts}), but most real grammars
	3164	have harmless shift/reduce conflicts which are resolved in a predictable
	3165	way and would be difficult to eliminate. It is desirable to suppress
	3166	the warning about these conflicts unless the number of conflicts
	3167	changes. You can do this with the @code{%expect} declaration.
	3168
	3169	The declaration looks like this:
	3170
	3171	@example
	3172	%expect @var{n}
	3173	@end example
	3174
	3175	Here @var{n} is a decimal integer. The declaration says there should be
	3176	no warning if there are @var{n} shift/reduce conflicts and no
	3177	reduce/reduce conflicts. An error, instead of the usual warning, is
	3178	given if there are either more or fewer conflicts, or if there are any
	3179	reduce/reduce conflicts.
	3180
	3181	In general, using @code{%expect} involves these steps:
	3182
	3183	@itemize @bullet
	3184	@item
	3185	Compile your grammar without @code{%expect}. Use the @samp{-v} option
	3186	to get a verbose list of where the conflicts occur. Bison will also
	3187	print the number of conflicts.
	3188
	3189	@item
	3190	Check each of the conflicts to make sure that Bison's default
	3191	resolution is what you really want. If not, rewrite the grammar and
	3192	go back to the beginning.
	3193
	3194	@item
	3195	Add an @code{%expect} declaration, copying the number @var{n} from the
	3196	number which Bison printed.
	3197	@end itemize
	3198
	3199	Now Bison will stop annoying you about the conflicts you have checked, but
	3200	it will warn you again if changes in the grammar result in additional
	3201	conflicts.
	3202
	3203	@node Start Decl
	3204	@subsection The Start-Symbol
	3205	@cindex declaring the start symbol
	3206	@cindex start symbol, declaring
	3207	@cindex default start symbol
	3208	@findex %start
	3209
	3210	Bison assumes by default that the start symbol for the grammar is the first
	3211	nonterminal specified in the grammar specification section. The programmer
	3212	may override this restriction with the @code{%start} declaration as follows:
	3213
	3214	@example
	3215	%start @var{symbol}
	3216	@end example
	3217
	3218	@node Pure Decl
	3219	@subsection A Pure (Reentrant) Parser
	3220	@cindex reentrant parser
	3221	@cindex pure parser
	3222	@findex %pure-parser
	3223
	3224	A @dfn{reentrant} program is one which does not alter in the course of
	3225	execution; in other words, it consists entirely of @dfn{pure} (read-only)
	3226	code. Reentrancy is important whenever asynchronous execution is possible;
	3227	for example, a non-reentrant program may not be safe to call from a signal
	3228	handler. In systems with multiple threads of control, a non-reentrant
	3229	program must be called only within interlocks.
	3230
	3231	Normally, Bison generates a parser which is not reentrant. This is
	3232	suitable for most uses, and it permits compatibility with YACC. (The
	3233	standard YACC interfaces are inherently nonreentrant, because they use
	3234	statically allocated variables for communication with @code{yylex},
	3235	including @code{yylval} and @code{yylloc}.)
	3236
	3237	Alternatively, you can generate a pure, reentrant parser. The Bison
	3238	declaration @code{%pure-parser} says that you want the parser to be
	3239	reentrant. It looks like this:
	3240
	3241	@example
	3242	%pure-parser
	3243	@end example
	3244
	3245	The result is that the communication variables @code{yylval} and
	3246	@code{yylloc} become local variables in @code{yyparse}, and a different
	3247	calling convention is used for the lexical analyzer function
	3248	@code{yylex}. @xref{Pure Calling, ,Calling Conventions for Pure
	3249	Parsers}, for the details of this. The variable @code{yynerrs} also
	3250	becomes local in @code{yyparse} (@pxref{Error Reporting, ,The Error
	3251	Reporting Function @code{yyerror}}). The convention for calling
	3252	@code{yyparse} itself is unchanged.
	3253
	3254	Whether the parser is pure has nothing to do with the grammar rules.
	3255	You can generate either a pure parser or a nonreentrant parser from any
	3256	valid grammar.
	3257
	3258	@node Decl Summary
	3259	@subsection Bison Declaration Summary
	3260	@cindex Bison declaration summary
	3261	@cindex declaration summary
	3262	@cindex summary, Bison declaration
	3263
	3264	Here is a summary of the declarations used to define a grammar:
	3265
	3266	@table @code
	3267	@item %union
	3268	Declare the collection of data types that semantic values may have
	3269	(@pxref{Union Decl, ,The Collection of Value Types}).
	3270
	3271	@item %token
	3272	Declare a terminal symbol (token type name) with no precedence
	3273	or associativity specified (@pxref{Token Decl, ,Token Type Names}).
	3274
	3275	@item %right
	3276	Declare a terminal symbol (token type name) that is right-associative
	3277	(@pxref{Precedence Decl, ,Operator Precedence}).
	3278
	3279	@item %left
	3280	Declare a terminal symbol (token type name) that is left-associative
	3281	(@pxref{Precedence Decl, ,Operator Precedence}).
	3282
	3283	@item %nonassoc
	3284	Declare a terminal symbol (token type name) that is nonassociative
	3285	(using it in a way that would be associative is a syntax error)
	3286	(@pxref{Precedence Decl, ,Operator Precedence}).
	3287
	3288	@item %type
	3289	Declare the type of semantic values for a nonterminal symbol
	3290	(@pxref{Type Decl, ,Nonterminal Symbols}).
	3291
	3292	@item %start
	3293	Specify the grammar's start symbol (@pxref{Start Decl, ,The
	3294	Start-Symbol}).
	3295
	3296	@item %expect
	3297	Declare the expected number of shift-reduce conflicts
	3298	(@pxref{Expect Decl, ,Suppressing Conflict Warnings}).
	3299	@end table
	3300
	3301	@sp 1
	3302	@noindent
	3303	In order to change the behavior of @command{bison}, use the following
	3304	directives:
	3305
	3306	@table @code
	3307	@item %debug
	3308	In the parser file, define the macro @code{YYDEBUG} to 1 if it is not
	3309	already defined, so that the debugging facilities are compiled.
	3310	@xref{Tracing, ,Tracing Your Parser}.
	3311
	3312	@item %defines
	3313	Write an extra output file containing macro definitions for the token
	3314	type names defined in the grammar and the semantic value type
	3315	@code{YYSTYPE}, as well as a few @code{extern} variable declarations.
	3316
	3317	If the parser output file is named @file{@var{name}.c} then this file
	3318	is named @file{@var{name}.h}.
	3319
	3320	This output file is essential if you wish to put the definition of
	3321	@code{yylex} in a separate source file, because @code{yylex} needs to
	3322	be able to refer to token type codes and the variable
	3323	@code{yylval}. @xref{Token Values, ,Semantic Values of Tokens}.
	3324
	3325	@item %file-prefix="@var{prefix}"
	3326	Specify a prefix to use for all Bison output file names. The names are
	3327	chosen as if the input file were named @file{@var{prefix}.y}.
	3328
	3329	@c @item %header-extension
	3330	@c Specify the extension of the parser header file generated when
	3331	@c @code{%define} or @samp{-d} are used.
	3332	@c
	3333	@c For example, a grammar file named @file{foo.ypp} and containing a
	3334	@c @code{%header-extension .hh} directive will produce a header file
	3335	@c named @file{foo.tab.hh}
	3336
	3337	@item %locations
	3338	Generate the code processing the locations (@pxref{Action Features,
	3339	,Special Features for Use in Actions}). This mode is enabled as soon as
	3340	the grammar uses the special @samp{@@@var{n}} tokens, but if your
	3341	grammar does not use it, using @samp{%locations} allows for more
	3342	accurate parse error messages.
	3343
	3344	@item %name-prefix="@var{prefix}"
	3345	Rename the external symbols used in the parser so that they start with
	3346	@var{prefix} instead of @samp{yy}. The precise list of symbols renamed
	3347	is @code{yyparse}, @code{yylex}, @code{yyerror}, @code{yynerrs},
	3348	@code{yylval}, @code{yychar}, @code{yydebug}, and possible
	3349	@code{yylloc}. For example, if you use @samp{%name-prefix="c_"}, the
	3350	names become @code{c_parse}, @code{c_lex}, and so on. @xref{Multiple
	3351	Parsers, ,Multiple Parsers in the Same Program}.
	3352
	3353	@item %no-parser
	3354	Do not include any C code in the parser file; generate tables only. The
	3355	parser file contains just @code{#define} directives and static variable
	3356	declarations.
	3357
	3358	This option also tells Bison to write the C code for the grammar actions
	3359	into a file named @file{@var{filename}.act}, in the form of a
	3360	brace-surrounded body fit for a @code{switch} statement.
	3361
	3362	@item %no-lines
	3363	Don't generate any @code{#line} preprocessor commands in the parser
	3364	file. Ordinarily Bison writes these commands in the parser file so that
	3365	the C compiler and debuggers will associate errors and object code with
	3366	your source file (the grammar file). This directive causes them to
	3367	associate errors with the parser file, treating it an independent source
	3368	file in its own right.
	3369
	3370	@item %output="@var{filename}"
	3371	Specify the @var{filename} for the parser file.
	3372
	3373	@item %pure-parser
	3374	Request a pure (reentrant) parser program (@pxref{Pure Decl, ,A Pure
	3375	(Reentrant) Parser}).
	3376
	3377	@c @item %source-extension
	3378	@c Specify the extension of the parser output file.
	3379	@c
	3380	@c For example, a grammar file named @file{foo.yy} and containing a
	3381	@c @code{%source-extension .cpp} directive will produce a parser file
	3382	@c named @file{foo.tab.cpp}
	3383
	3384	@item %token-table
	3385	Generate an array of token names in the parser file. The name of the
	3386	array is @code{yytname}; @code{yytname[@var{i}]} is the name of the
	3387	token whose internal Bison token code number is @var{i}. The first three
	3388	elements of @code{yytname} are always @code{"$"}, @code{"error"}, and
	3389	@code{"$illegal"}; after these come the symbols defined in the grammar
	3390	file.
	3391
	3392	For single-character literal tokens and literal string tokens, the name
	3393	in the table includes the single-quote or double-quote characters: for
	3394	example, @code{"'+'"} is a single-character literal and @code{"\"<=\""}
	3395	is a literal string token. All the characters of the literal string
	3396	token appear verbatim in the string found in the table; even
	3397	double-quote characters are not escaped. For example, if the token
	3398	consists of three characters @samp{"}, its string in @code{yytname}
	3399	contains @samp{"""}. (In C, that would be written as
	3400	@code{"\"\"\""}).
	3401
	3402	When you specify @code{%token-table}, Bison also generates macro
	3403	definitions for macros @code{YYNTOKENS}, @code{YYNNTS}, and
	3404	@code{YYNRULES}, and @code{YYNSTATES}:
	3405
	3406	@table @code
	3407	@item YYNTOKENS
	3408	The highest token number, plus one.
	3409	@item YYNNTS
	3410	The number of nonterminal symbols.
	3411	@item YYNRULES
	3412	The number of grammar rules,
	3413	@item YYNSTATES
	3414	The number of parser states (@pxref{Parser States}).
	3415	@end table
	3416
	3417	@item %verbose
	3418	Write an extra output file containing verbose descriptions of the
	3419	parser states and what is done for each type of look-ahead token in
	3420	that state. @xref{Understanding, , Understanding Your Parser}, for more
	3421	information.
	3422
	3423
	3424
	3425	@item %yacc
	3426	Pretend the option @option{--yacc} was given, i.e., imitate Yacc,
	3427	including its naming conventions. @xref{Bison Options}, for more.
	3428	@end table
	3429
	3430
	3431
	3432
	3433	@node Multiple Parsers
	3434	@section Multiple Parsers in the Same Program
	3435
	3436	Most programs that use Bison parse only one language and therefore contain
	3437	only one Bison parser. But what if you want to parse more than one
	3438	language with the same program? Then you need to avoid a name conflict
	3439	between different definitions of @code{yyparse}, @code{yylval}, and so on.
	3440
	3441	The easy way to do this is to use the option @samp{-p @var{prefix}}
	3442	(@pxref{Invocation, ,Invoking Bison}). This renames the interface
	3443	functions and variables of the Bison parser to start with @var{prefix}
	3444	instead of @samp{yy}. You can use this to give each parser distinct
	3445	names that do not conflict.
	3446
	3447	The precise list of symbols renamed is @code{yyparse}, @code{yylex},
	3448	@code{yyerror}, @code{yynerrs}, @code{yylval}, @code{yychar} and
	3449	@code{yydebug}. For example, if you use @samp{-p c}, the names become
	3450	@code{cparse}, @code{clex}, and so on.
	3451
	3452	@strong{All the other variables and macros associated with Bison are not
	3453	renamed.} These others are not global; there is no conflict if the same
	3454	name is used in different parsers. For example, @code{YYSTYPE} is not
	3455	renamed, but defining this in different ways in different parsers causes
	3456	no trouble (@pxref{Value Type, ,Data Types of Semantic Values}).
	3457
	3458	The @samp{-p} option works by adding macro definitions to the beginning
	3459	of the parser source file, defining @code{yyparse} as
	3460	@code{@var{prefix}parse}, and so on. This effectively substitutes one
	3461	name for the other in the entire parser file.
	3462
	3463	@node Interface
	3464	@chapter Parser C-Language Interface
	3465	@cindex C-language interface
	3466	@cindex interface
	3467
	3468	The Bison parser is actually a C function named @code{yyparse}. Here we
	3469	describe the interface conventions of @code{yyparse} and the other
	3470	functions that it needs to use.
	3471
	3472	Keep in mind that the parser uses many C identifiers starting with
	3473	@samp{yy} and @samp{YY} for internal purposes. If you use such an
	3474	identifier (aside from those in this manual) in an action or in epilogue
	3475	in the grammar file, you are likely to run into trouble.
	3476
	3477	@menu
	3478	* Parser Function:: How to call @code{yyparse} and what it returns.
	3479	* Lexical:: You must supply a function @code{yylex}
	3480	which reads tokens.
	3481	* Error Reporting:: You must supply a function @code{yyerror}.
	3482	* Action Features:: Special features for use in actions.
	3483	@end menu
	3484
	3485	@node Parser Function
	3486	@section The Parser Function @code{yyparse}
	3487	@findex yyparse
	3488
	3489	You call the function @code{yyparse} to cause parsing to occur. This
	3490	function reads tokens, executes actions, and ultimately returns when it
	3491	encounters end-of-input or an unrecoverable syntax error. You can also
	3492	write an action which directs @code{yyparse} to return immediately
	3493	without reading further.
	3494
	3495	The value returned by @code{yyparse} is 0 if parsing was successful (return
	3496	is due to end-of-input).
	3497
	3498	The value is 1 if parsing failed (return is due to a syntax error).
	3499
	3500	In an action, you can cause immediate return from @code{yyparse} by using
	3501	these macros:
	3502
	3503	@table @code
	3504	@item YYACCEPT
	3505	@findex YYACCEPT
	3506	Return immediately with value 0 (to report success).
	3507
	3508	@item YYABORT
	3509	@findex YYABORT
	3510	Return immediately with value 1 (to report failure).
	3511	@end table
	3512
	3513	@node Lexical
	3514	@section The Lexical Analyzer Function @code{yylex}
	3515	@findex yylex
	3516	@cindex lexical analyzer
	3517
	3518	The @dfn{lexical analyzer} function, @code{yylex}, recognizes tokens from
	3519	the input stream and returns them to the parser. Bison does not create
	3520	this function automatically; you must write it so that @code{yyparse} can
	3521	call it. The function is sometimes referred to as a lexical scanner.
	3522
	3523	In simple programs, @code{yylex} is often defined at the end of the Bison
	3524	grammar file. If @code{yylex} is defined in a separate source file, you
	3525	need to arrange for the token-type macro definitions to be available there.
	3526	To do this, use the @samp{-d} option when you run Bison, so that it will
	3527	write these macro definitions into a separate header file
	3528	@file{@var{name}.tab.h} which you can include in the other source files
	3529	that need it. @xref{Invocation, ,Invoking Bison}.
	3530
	3531	@menu
	3532	* Calling Convention:: How @code{yyparse} calls @code{yylex}.
	3533	* Token Values:: How @code{yylex} must return the semantic value
	3534	of the token it has read.
	3535	* Token Positions:: How @code{yylex} must return the text position
	3536	(line number, etc.) of the token, if the
	3537	actions want that.
	3538	* Pure Calling:: How the calling convention differs
	3539	in a pure parser (@pxref{Pure Decl, ,A Pure (Reentrant) Parser}).
	3540	@end menu
	3541
	3542	@node Calling Convention
	3543	@subsection Calling Convention for @code{yylex}
	3544
	3545	The value that @code{yylex} returns must be the numeric code for the type
	3546	of token it has just found, or 0 for end-of-input.
	3547
	3548	When a token is referred to in the grammar rules by a name, that name
	3549	in the parser file becomes a C macro whose definition is the proper
	3550	numeric code for that token type. So @code{yylex} can use the name
	3551	to indicate that type. @xref{Symbols}.
	3552
	3553	When a token is referred to in the grammar rules by a character literal,
	3554	the numeric code for that character is also the code for the token type.
	3555	So @code{yylex} can simply return that character code. The null character
	3556	must not be used this way, because its code is zero and that is what
	3557	signifies end-of-input.
	3558
	3559	Here is an example showing these things:
	3560
	3561	@example
	3562	int
	3563	yylex (void)
	3564	@{
	3565	@dots{}
	3566	if (c == EOF) /* Detect end of file. */
	3567	return 0;
	3568	@dots{}
	3569	if (c == '+' \|\| c == '-')
	3570	return c; /* Assume token type for `+' is '+'. */
	3571	@dots{}
	3572	return INT; /* Return the type of the token. */
	3573	@dots{}
	3574	@}
	3575	@end example
	3576
	3577	@noindent
	3578	This interface has been designed so that the output from the @code{lex}
	3579	utility can be used without change as the definition of @code{yylex}.
	3580
	3581	If the grammar uses literal string tokens, there are two ways that
	3582	@code{yylex} can determine the token type codes for them:
	3583
	3584	@itemize @bullet
	3585	@item
	3586	If the grammar defines symbolic token names as aliases for the
	3587	literal string tokens, @code{yylex} can use these symbolic names like
	3588	all others. In this case, the use of the literal string tokens in
	3589	the grammar file has no effect on @code{yylex}.
	3590
	3591	@item
	3592	@code{yylex} can find the multicharacter token in the @code{yytname}
	3593	table. The index of the token in the table is the token type's code.
	3594	The name of a multicharacter token is recorded in @code{yytname} with a
	3595	double-quote, the token's characters, and another double-quote. The
	3596	token's characters are not escaped in any way; they appear verbatim in
	3597	the contents of the string in the table.
	3598
	3599	Here's code for looking up a token in @code{yytname}, assuming that the
	3600	characters of the token are stored in @code{token_buffer}.
	3601
	3602	@smallexample
	3603	for (i = 0; i < YYNTOKENS; i++)
	3604	@{
	3605	if (yytname[i] != 0
	3606	&& yytname[i][0] == '"'
	3607	&& strncmp (yytname[i] + 1, token_buffer,
	3608	strlen (token_buffer))
	3609	&& yytname[i][strlen (token_buffer) + 1] == '"'
	3610	&& yytname[i][strlen (token_buffer) + 2] == 0)
	3611	break;
	3612	@}
	3613	@end smallexample
	3614
	3615	The @code{yytname} table is generated only if you use the
	3616	@code{%token-table} declaration. @xref{Decl Summary}.
	3617	@end itemize
	3618
	3619	@node Token Values
	3620	@subsection Semantic Values of Tokens
	3621
	3622	@vindex yylval
	3623	In an ordinary (non-reentrant) parser, the semantic value of the token must
	3624	be stored into the global variable @code{yylval}. When you are using
	3625	just one data type for semantic values, @code{yylval} has that type.
	3626	Thus, if the type is @code{int} (the default), you might write this in
	3627	@code{yylex}:
	3628
	3629	@example
	3630	@group
	3631	@dots{}
	3632	yylval = value; /* Put value onto Bison stack. */
	3633	return INT; /* Return the type of the token. */
	3634	@dots{}
	3635	@end group
	3636	@end example
	3637
	3638	When you are using multiple data types, @code{yylval}'s type is a union
	3639	made from the @code{%union} declaration (@pxref{Union Decl, ,The
	3640	Collection of Value Types}). So when you store a token's value, you
	3641	must use the proper member of the union. If the @code{%union}
	3642	declaration looks like this:
	3643
	3644	@example
	3645	@group
	3646	%union @{
	3647	int intval;
	3648	double val;
	3649	symrec *tptr;
	3650	@}
	3651	@end group
	3652	@end example
	3653
	3654	@noindent
	3655	then the code in @code{yylex} might look like this:
	3656
	3657	@example
	3658	@group
	3659	@dots{}
	3660	yylval.intval = value; /* Put value onto Bison stack. */
	3661	return INT; /* Return the type of the token. */
	3662	@dots{}
	3663	@end group
	3664	@end example
	3665
	3666	@node Token Positions
	3667	@subsection Textual Positions of Tokens
	3668
	3669	@vindex yylloc
	3670	If you are using the @samp{@@@var{n}}-feature (@pxref{Locations, ,
	3671	Tracking Locations}) in actions to keep track of the
	3672	textual locations of tokens and groupings, then you must provide this
	3673	information in @code{yylex}. The function @code{yyparse} expects to
	3674	find the textual location of a token just parsed in the global variable
	3675	@code{yylloc}. So @code{yylex} must store the proper data in that
	3676	variable.
	3677
	3678	By default, the value of @code{yylloc} is a structure and you need only
	3679	initialize the members that are going to be used by the actions. The
	3680	four members are called @code{first_line}, @code{first_column},
	3681	@code{last_line} and @code{last_column}. Note that the use of this
	3682	feature makes the parser noticeably slower.
	3683
	3684	@tindex YYLTYPE
	3685	The data type of @code{yylloc} has the name @code{YYLTYPE}.
	3686
	3687	@node Pure Calling
	3688	@subsection Calling Conventions for Pure Parsers
	3689
	3690	When you use the Bison declaration @code{%pure-parser} to request a
	3691	pure, reentrant parser, the global communication variables @code{yylval}
	3692	and @code{yylloc} cannot be used. (@xref{Pure Decl, ,A Pure (Reentrant)
	3693	Parser}.) In such parsers the two global variables are replaced by
	3694	pointers passed as arguments to @code{yylex}. You must declare them as
	3695	shown here, and pass the information back by storing it through those
	3696	pointers.
	3697
	3698	@example
	3699	int
	3700	yylex (YYSTYPE lvalp, YYLTYPE llocp)
	3701	@{
	3702	@dots{}
	3703	lvalp = value; / Put value onto Bison stack. */
	3704	return INT; /* Return the type of the token. */
	3705	@dots{}
	3706	@}
	3707	@end example
	3708
	3709	If the grammar file does not use the @samp{@@} constructs to refer to
	3710	textual positions, then the type @code{YYLTYPE} will not be defined. In
	3711	this case, omit the second argument; @code{yylex} will be called with
	3712	only one argument.
	3713
	3714	@vindex YYPARSE_PARAM
	3715	If you use a reentrant parser, you can optionally pass additional
	3716	parameter information to it in a reentrant way. To do so, define the
	3717	macro @code{YYPARSE_PARAM} as a variable name. This modifies the
	3718	@code{yyparse} function to accept one argument, of type @code{void *},
	3719	with that name.
	3720
	3721	When you call @code{yyparse}, pass the address of an object, casting the
	3722	address to @code{void *}. The grammar actions can refer to the contents
	3723	of the object by casting the pointer value back to its proper type and
	3724	then dereferencing it. Here's an example. Write this in the parser:
	3725
	3726	@example
	3727	%@{
	3728	struct parser_control
	3729	@{
	3730	int nastiness;
	3731	int randomness;
	3732	@};
	3733
	3734	#define YYPARSE_PARAM parm
	3735	%@}
	3736	@end example
	3737
	3738	@noindent
	3739	Then call the parser like this:
	3740
	3741	@example
	3742	struct parser_control
	3743	@{
	3744	int nastiness;
	3745	int randomness;
	3746	@};
	3747
	3748	@dots{}
	3749
	3750	@{
	3751	struct parser_control foo;
	3752	@dots{} /* @r{Store proper data in @code{foo}.} */
	3753	value = yyparse ((void *) &foo);
	3754	@dots{}
	3755	@}
	3756	@end example
	3757
	3758	@noindent
	3759	In the grammar actions, use expressions like this to refer to the data:
	3760
	3761	@example
	3762	((struct parser_control *) parm)->randomness
	3763	@end example
	3764
	3765	@vindex YYLEX_PARAM
	3766	If you wish to pass the additional parameter data to @code{yylex},
	3767	define the macro @code{YYLEX_PARAM} just like @code{YYPARSE_PARAM}, as
	3768	shown here:
	3769
	3770	@example
	3771	%@{
	3772	struct parser_control
	3773	@{
	3774	int nastiness;
	3775	int randomness;
	3776	@};
	3777
	3778	#define YYPARSE_PARAM parm
	3779	#define YYLEX_PARAM parm
	3780	%@}
	3781	@end example
	3782
	3783	You should then define @code{yylex} to accept one additional
	3784	argument---the value of @code{parm}. (This makes either two or three
	3785	arguments in total, depending on whether an argument of type
	3786	@code{YYLTYPE} is passed.) You can declare the argument as a pointer to
	3787	the proper object type, or you can declare it as @code{void *} and
	3788	access the contents as shown above.
	3789
	3790	You can use @samp{%pure-parser} to request a reentrant parser without
	3791	also using @code{YYPARSE_PARAM}. Then you should call @code{yyparse}
	3792	with no arguments, as usual.
	3793
	3794	@node Error Reporting
	3795	@section The Error Reporting Function @code{yyerror}
	3796	@cindex error reporting function
	3797	@findex yyerror
	3798	@cindex parse error
	3799	@cindex syntax error
	3800
	3801	The Bison parser detects a @dfn{parse error} or @dfn{syntax error}
	3802	whenever it reads a token which cannot satisfy any syntax rule. An
	3803	action in the grammar can also explicitly proclaim an error, using the
	3804	macro @code{YYERROR} (@pxref{Action Features, ,Special Features for Use
	3805	in Actions}).
	3806
	3807	The Bison parser expects to report the error by calling an error
	3808	reporting function named @code{yyerror}, which you must supply. It is
	3809	called by @code{yyparse} whenever a syntax error is found, and it
	3810	receives one argument. For a parse error, the string is normally
	3811	@w{@code{"parse error"}}.
	3812
	3813	@findex YYERROR_VERBOSE
	3814	If you define the macro @code{YYERROR_VERBOSE} in the Bison declarations
	3815	section (@pxref{Bison Declarations, ,The Bison Declarations Section}),
	3816	then Bison provides a more verbose and specific error message string
	3817	instead of just plain @w{@code{"parse error"}}. It doesn't matter what
	3818	definition you use for @code{YYERROR_VERBOSE}, just whether you define
	3819	it.
	3820
	3821	The parser can detect one other kind of error: stack overflow. This
	3822	happens when the input contains constructions that are very deeply
	3823	nested. It isn't likely you will encounter this, since the Bison
	3824	parser extends its stack automatically up to a very large limit. But
	3825	if overflow happens, @code{yyparse} calls @code{yyerror} in the usual
	3826	fashion, except that the argument string is @w{@code{"parser stack
	3827	overflow"}}.
	3828
	3829	The following definition suffices in simple programs:
	3830
	3831	@example
	3832	@group
	3833	void
	3834	yyerror (char *s)
	3835	@{
	3836	@end group
	3837	@group
	3838	fprintf (stderr, "%s\n", s);
	3839	@}
	3840	@end group
	3841	@end example
	3842
	3843	After @code{yyerror} returns to @code{yyparse}, the latter will attempt
	3844	error recovery if you have written suitable error recovery grammar rules
	3845	(@pxref{Error Recovery}). If recovery is impossible, @code{yyparse} will
	3846	immediately return 1.
	3847
	3848	@vindex yynerrs
	3849	The variable @code{yynerrs} contains the number of syntax errors
	3850	encountered so far. Normally this variable is global; but if you
	3851	request a pure parser (@pxref{Pure Decl, ,A Pure (Reentrant) Parser})
	3852	then it is a local variable which only the actions can access.
	3853
	3854	@node Action Features
	3855	@section Special Features for Use in Actions
	3856	@cindex summary, action features
	3857	@cindex action features summary
	3858
	3859	Here is a table of Bison constructs, variables and macros that
	3860	are useful in actions.
	3861
	3862	@table @samp
	3863	@item $$
	3864	Acts like a variable that contains the semantic value for the
	3865	grouping made by the current rule. @xref{Actions}.
	3866
	3867	@item $@var{n}
	3868	Acts like a variable that contains the semantic value for the
	3869	@var{n}th component of the current rule. @xref{Actions}.
	3870
	3871	@item $<@var{typealt}>$
	3872	Like @code{$$} but specifies alternative @var{typealt} in the union
	3873	specified by the @code{%union} declaration. @xref{Action Types, ,Data
	3874	Types of Values in Actions}.
	3875
	3876	@item $<@var{typealt}>@var{n}
	3877	Like @code{$@var{n}} but specifies alternative @var{typealt} in the
	3878	union specified by the @code{%union} declaration.
	3879	@xref{Action Types, ,Data Types of Values in Actions}.
	3880
	3881	@item YYABORT;
	3882	Return immediately from @code{yyparse}, indicating failure.
	3883	@xref{Parser Function, ,The Parser Function @code{yyparse}}.
	3884
	3885	@item YYACCEPT;
	3886	Return immediately from @code{yyparse}, indicating success.
	3887	@xref{Parser Function, ,The Parser Function @code{yyparse}}.
	3888
	3889	@item YYBACKUP (@var{token}, @var{value});
	3890	@findex YYBACKUP
	3891	Unshift a token. This macro is allowed only for rules that reduce
	3892	a single value, and only when there is no look-ahead token.
	3893	It installs a look-ahead token with token type @var{token} and
	3894	semantic value @var{value}; then it discards the value that was
	3895	going to be reduced by this rule.
	3896
	3897	If the macro is used when it is not valid, such as when there is
	3898	a look-ahead token already, then it reports a syntax error with
	3899	a message @samp{cannot back up} and performs ordinary error
	3900	recovery.
	3901
	3902	In either case, the rest of the action is not executed.
	3903
	3904	@item YYEMPTY
	3905	@vindex YYEMPTY
	3906	Value stored in @code{yychar} when there is no look-ahead token.
	3907
	3908	@item YYERROR;
	3909	@findex YYERROR
	3910	Cause an immediate syntax error. This statement initiates error
	3911	recovery just as if the parser itself had detected an error; however, it
	3912	does not call @code{yyerror}, and does not print any message. If you
	3913	want to print an error message, call @code{yyerror} explicitly before
	3914	the @samp{YYERROR;} statement. @xref{Error Recovery}.
	3915
	3916	@item YYRECOVERING
	3917	This macro stands for an expression that has the value 1 when the parser
	3918	is recovering from a syntax error, and 0 the rest of the time.
	3919	@xref{Error Recovery}.
	3920
	3921	@item yychar
	3922	Variable containing the current look-ahead token. (In a pure parser,
	3923	this is actually a local variable within @code{yyparse}.) When there is
	3924	no look-ahead token, the value @code{YYEMPTY} is stored in the variable.
	3925	@xref{Look-Ahead, ,Look-Ahead Tokens}.
	3926
	3927	@item yyclearin;
	3928	Discard the current look-ahead token. This is useful primarily in
	3929	error rules. @xref{Error Recovery}.
	3930
	3931	@item yyerrok;
	3932	Resume generating error messages immediately for subsequent syntax
	3933	errors. This is useful primarily in error rules.
	3934	@xref{Error Recovery}.
	3935
	3936	@item @@$
	3937	@findex @@$
	3938	Acts like a structure variable containing information on the textual position
	3939	of the grouping made by the current rule. @xref{Locations, ,
	3940	Tracking Locations}.
	3941
	3942	@c Check if those paragraphs are still useful or not.
	3943
	3944	@c @example
	3945	@c struct @{
	3946	@c int first_line, last_line;
	3947	@c int first_column, last_column;
	3948	@c @};
	3949	@c @end example
	3950
	3951	@c Thus, to get the starting line number of the third component, you would
	3952	@c use @samp{@@3.first_line}.
	3953
	3954	@c In order for the members of this structure to contain valid information,
	3955	@c you must make @code{yylex} supply this information about each token.
	3956	@c If you need only certain members, then @code{yylex} need only fill in
	3957	@c those members.
	3958
	3959	@c The use of this feature makes the parser noticeably slower.
	3960
	3961	@item @@@var{n}
	3962	@findex @@@var{n}
	3963	Acts like a structure variable containing information on the textual position
	3964	of the @var{n}th component of the current rule. @xref{Locations, ,
	3965	Tracking Locations}.
	3966
	3967	@end table
	3968
	3969	@node Algorithm
	3970	@chapter The Bison Parser Algorithm
	3971	@cindex Bison parser algorithm
	3972	@cindex algorithm of parser
	3973	@cindex shifting
	3974	@cindex reduction
	3975	@cindex parser stack
	3976	@cindex stack, parser
	3977
	3978	As Bison reads tokens, it pushes them onto a stack along with their
	3979	semantic values. The stack is called the @dfn{parser stack}. Pushing a
	3980	token is traditionally called @dfn{shifting}.
	3981
	3982	For example, suppose the infix calculator has read @samp{1 + 5 *}, with a
	3983	@samp{3} to come. The stack will have four elements, one for each token
	3984	that was shifted.
	3985
	3986	But the stack does not always have an element for each token read. When
	3987	the last @var{n} tokens and groupings shifted match the components of a
	3988	grammar rule, they can be combined according to that rule. This is called
	3989	@dfn{reduction}. Those tokens and groupings are replaced on the stack by a
	3990	single grouping whose symbol is the result (left hand side) of that rule.
	3991	Running the rule's action is part of the process of reduction, because this
	3992	is what computes the semantic value of the resulting grouping.
	3993
	3994	For example, if the infix calculator's parser stack contains this:
	3995
	3996	@example
	3997	1 + 5 * 3
	3998	@end example
	3999
	4000	@noindent
	4001	and the next input token is a newline character, then the last three
	4002	elements can be reduced to 15 via the rule:
	4003
	4004	@example
	4005	expr: expr '*' expr;
	4006	@end example
	4007
	4008	@noindent
	4009	Then the stack contains just these three elements:
	4010
	4011	@example
	4012	1 + 15
	4013	@end example
	4014
	4015	@noindent
	4016	At this point, another reduction can be made, resulting in the single value
	4017	16. Then the newline token can be shifted.
	4018
	4019	The parser tries, by shifts and reductions, to reduce the entire input down
	4020	to a single grouping whose symbol is the grammar's start-symbol
	4021	(@pxref{Language and Grammar, ,Languages and Context-Free Grammars}).
	4022
	4023	This kind of parser is known in the literature as a bottom-up parser.
	4024
	4025	@menu
	4026	* Look-Ahead:: Parser looks one token ahead when deciding what to do.
	4027	* Shift/Reduce:: Conflicts: when either shifting or reduction is valid.
	4028	* Precedence:: Operator precedence works by resolving conflicts.
	4029	* Contextual Precedence:: When an operator's precedence depends on context.
	4030	* Parser States:: The parser is a finite-state-machine with stack.
	4031	* Reduce/Reduce:: When two rules are applicable in the same situation.
	4032	* Mystery Conflicts:: Reduce/reduce conflicts that look unjustified.
	4033	* Stack Overflow:: What happens when stack gets full. How to avoid it.
	4034	@end menu
	4035
	4036	@node Look-Ahead
	4037	@section Look-Ahead Tokens
	4038	@cindex look-ahead token
	4039
	4040	The Bison parser does @emph{not} always reduce immediately as soon as the
	4041	last @var{n} tokens and groupings match a rule. This is because such a
	4042	simple strategy is inadequate to handle most languages. Instead, when a
	4043	reduction is possible, the parser sometimes ``looks ahead'' at the next
	4044	token in order to decide what to do.
	4045
	4046	When a token is read, it is not immediately shifted; first it becomes the
	4047	@dfn{look-ahead token}, which is not on the stack. Now the parser can
	4048	perform one or more reductions of tokens and groupings on the stack, while
	4049	the look-ahead token remains off to the side. When no more reductions
	4050	should take place, the look-ahead token is shifted onto the stack. This
	4051	does not mean that all possible reductions have been done; depending on the
	4052	token type of the look-ahead token, some rules may choose to delay their
	4053	application.
	4054
	4055	Here is a simple case where look-ahead is needed. These three rules define
	4056	expressions which contain binary addition operators and postfix unary
	4057	factorial operators (@samp{!}), and allow parentheses for grouping.
	4058
	4059	@example
	4060	@group
	4061	expr: term '+' expr
	4062	\| term
	4063	;
	4064	@end group
	4065
	4066	@group
	4067	term: '(' expr ')'
	4068	\| term '!'
	4069	\| NUMBER
	4070	;
	4071	@end group
	4072	@end example
	4073
	4074	Suppose that the tokens @w{@samp{1 + 2}} have been read and shifted; what
	4075	should be done? If the following token is @samp{)}, then the first three
	4076	tokens must be reduced to form an @code{expr}. This is the only valid
	4077	course, because shifting the @samp{)} would produce a sequence of symbols
	4078	@w{@code{term ')'}}, and no rule allows this.
	4079
	4080	If the following token is @samp{!}, then it must be shifted immediately so
	4081	that @w{@samp{2 !}} can be reduced to make a @code{term}. If instead the
	4082	parser were to reduce before shifting, @w{@samp{1 + 2}} would become an
	4083	@code{expr}. It would then be impossible to shift the @samp{!} because
	4084	doing so would produce on the stack the sequence of symbols @code{expr
	4085	'!'}. No rule allows that sequence.
	4086
	4087	@vindex yychar
	4088	The current look-ahead token is stored in the variable @code{yychar}.
	4089	@xref{Action Features, ,Special Features for Use in Actions}.
	4090
	4091	@node Shift/Reduce
	4092	@section Shift/Reduce Conflicts
	4093	@cindex conflicts
	4094	@cindex shift/reduce conflicts
	4095	@cindex dangling @code{else}
	4096	@cindex @code{else}, dangling
	4097
	4098	Suppose we are parsing a language which has if-then and if-then-else
	4099	statements, with a pair of rules like this:
	4100
	4101	@example
	4102	@group
	4103	if_stmt:
	4104	IF expr THEN stmt
	4105	\| IF expr THEN stmt ELSE stmt
	4106	;
	4107	@end group
	4108	@end example
	4109
	4110	@noindent
	4111	Here we assume that @code{IF}, @code{THEN} and @code{ELSE} are
	4112	terminal symbols for specific keyword tokens.
	4113
	4114	When the @code{ELSE} token is read and becomes the look-ahead token, the
	4115	contents of the stack (assuming the input is valid) are just right for
	4116	reduction by the first rule. But it is also legitimate to shift the
	4117	@code{ELSE}, because that would lead to eventual reduction by the second
	4118	rule.
	4119
	4120	This situation, where either a shift or a reduction would be valid, is
	4121	called a @dfn{shift/reduce conflict}. Bison is designed to resolve
	4122	these conflicts by choosing to shift, unless otherwise directed by
	4123	operator precedence declarations. To see the reason for this, let's
	4124	contrast it with the other alternative.
	4125
	4126	Since the parser prefers to shift the @code{ELSE}, the result is to attach
	4127	the else-clause to the innermost if-statement, making these two inputs
	4128	equivalent:
	4129
	4130	@example
	4131	if x then if y then win (); else lose;
	4132
	4133	if x then do; if y then win (); else lose; end;
	4134	@end example
	4135
	4136	But if the parser chose to reduce when possible rather than shift, the
	4137	result would be to attach the else-clause to the outermost if-statement,
	4138	making these two inputs equivalent:
	4139
	4140	@example
	4141	if x then if y then win (); else lose;
	4142
	4143	if x then do; if y then win (); end; else lose;
	4144	@end example
	4145
	4146	The conflict exists because the grammar as written is ambiguous: either
	4147	parsing of the simple nested if-statement is legitimate. The established
	4148	convention is that these ambiguities are resolved by attaching the
	4149	else-clause to the innermost if-statement; this is what Bison accomplishes
	4150	by choosing to shift rather than reduce. (It would ideally be cleaner to
	4151	write an unambiguous grammar, but that is very hard to do in this case.)
	4152	This particular ambiguity was first encountered in the specifications of
	4153	Algol 60 and is called the ``dangling @code{else}'' ambiguity.
	4154
	4155	To avoid warnings from Bison about predictable, legitimate shift/reduce
	4156	conflicts, use the @code{%expect @var{n}} declaration. There will be no
	4157	warning as long as the number of shift/reduce conflicts is exactly @var{n}.
	4158	@xref{Expect Decl, ,Suppressing Conflict Warnings}.
	4159
	4160	The definition of @code{if_stmt} above is solely to blame for the
	4161	conflict, but the conflict does not actually appear without additional
	4162	rules. Here is a complete Bison input file that actually manifests the
	4163	conflict:
	4164
	4165	@example
	4166	@group
	4167	%token IF THEN ELSE variable
	4168	%%
	4169	@end group
	4170	@group
	4171	stmt: expr
	4172	\| if_stmt
	4173	;
	4174	@end group
	4175
	4176	@group
	4177	if_stmt:
	4178	IF expr THEN stmt
	4179	\| IF expr THEN stmt ELSE stmt
	4180	;
	4181	@end group
	4182
	4183	expr: variable
	4184	;
	4185	@end example
	4186
	4187	@node Precedence
	4188	@section Operator Precedence
	4189	@cindex operator precedence
	4190	@cindex precedence of operators
	4191
	4192	Another situation where shift/reduce conflicts appear is in arithmetic
	4193	expressions. Here shifting is not always the preferred resolution; the
	4194	Bison declarations for operator precedence allow you to specify when to
	4195	shift and when to reduce.
	4196
	4197	@menu
	4198	* Why Precedence:: An example showing why precedence is needed.
	4199	* Using Precedence:: How to specify precedence in Bison grammars.
	4200	* Precedence Examples:: How these features are used in the previous example.
	4201	* How Precedence:: How they work.
	4202	@end menu
	4203
	4204	@node Why Precedence
	4205	@subsection When Precedence is Needed
	4206
	4207	Consider the following ambiguous grammar fragment (ambiguous because the
	4208	input @w{@samp{1 - 2 * 3}} can be parsed in two different ways):
	4209
	4210	@example
	4211	@group
	4212	expr: expr '-' expr
	4213	\| expr '*' expr
	4214	\| expr '<' expr
	4215	\| '(' expr ')'
	4216	@dots{}
	4217	;
	4218	@end group
	4219	@end example
	4220
	4221	@noindent
	4222	Suppose the parser has seen the tokens @samp{1}, @samp{-} and @samp{2};
	4223	should it reduce them via the rule for the subtraction operator? It
	4224	depends on the next token. Of course, if the next token is @samp{)}, we
	4225	must reduce; shifting is invalid because no single rule can reduce the
	4226	token sequence @w{@samp{- 2 )}} or anything starting with that. But if
	4227	the next token is @samp{*} or @samp{<}, we have a choice: either
	4228	shifting or reduction would allow the parse to complete, but with
	4229	different results.
	4230
	4231	To decide which one Bison should do, we must consider the results. If
	4232	the next operator token @var{op} is shifted, then it must be reduced
	4233	first in order to permit another opportunity to reduce the difference.
	4234	The result is (in effect) @w{@samp{1 - (2 @var{op} 3)}}. On the other
	4235	hand, if the subtraction is reduced before shifting @var{op}, the result
	4236	is @w{@samp{(1 - 2) @var{op} 3}}. Clearly, then, the choice of shift or
	4237	reduce should depend on the relative precedence of the operators
	4238	@samp{-} and @var{op}: @samp{*} should be shifted first, but not
	4239	@samp{<}.
	4240
	4241	@cindex associativity
	4242	What about input such as @w{@samp{1 - 2 - 5}}; should this be
	4243	@w{@samp{(1 - 2) - 5}} or should it be @w{@samp{1 - (2 - 5)}}? For most
	4244	operators we prefer the former, which is called @dfn{left association}.
	4245	The latter alternative, @dfn{right association}, is desirable for
	4246	assignment operators. The choice of left or right association is a
	4247	matter of whether the parser chooses to shift or reduce when the stack
	4248	contains @w{@samp{1 - 2}} and the look-ahead token is @samp{-}: shifting
	4249	makes right-associativity.
	4250
	4251	@node Using Precedence
	4252	@subsection Specifying Operator Precedence
	4253	@findex %left
	4254	@findex %right
	4255	@findex %nonassoc
	4256
	4257	Bison allows you to specify these choices with the operator precedence
	4258	declarations @code{%left} and @code{%right}. Each such declaration
	4259	contains a list of tokens, which are operators whose precedence and
	4260	associativity is being declared. The @code{%left} declaration makes all
	4261	those operators left-associative and the @code{%right} declaration makes
	4262	them right-associative. A third alternative is @code{%nonassoc}, which
	4263	declares that it is a syntax error to find the same operator twice ``in a
	4264	row''.
	4265
	4266	The relative precedence of different operators is controlled by the
	4267	order in which they are declared. The first @code{%left} or
	4268	@code{%right} declaration in the file declares the operators whose
	4269	precedence is lowest, the next such declaration declares the operators
	4270	whose precedence is a little higher, and so on.
	4271
	4272	@node Precedence Examples
	4273	@subsection Precedence Examples
	4274
	4275	In our example, we would want the following declarations:
	4276
	4277	@example
	4278	%left '<'
	4279	%left '-'
	4280	%left '*'
	4281	@end example
	4282
	4283	In a more complete example, which supports other operators as well, we
	4284	would declare them in groups of equal precedence. For example, @code{'+'} is
	4285	declared with @code{'-'}:
	4286
	4287	@example
	4288	%left '<' '>' '=' NE LE GE
	4289	%left '+' '-'
	4290	%left '*' '/'
	4291	@end example
	4292
	4293	@noindent
	4294	(Here @code{NE} and so on stand for the operators for ``not equal''
	4295	and so on. We assume that these tokens are more than one character long
	4296	and therefore are represented by names, not character literals.)
	4297
	4298	@node How Precedence
	4299	@subsection How Precedence Works
	4300
	4301	The first effect of the precedence declarations is to assign precedence
	4302	levels to the terminal symbols declared. The second effect is to assign
	4303	precedence levels to certain rules: each rule gets its precedence from
	4304	the last terminal symbol mentioned in the components. (You can also
	4305	specify explicitly the precedence of a rule. @xref{Contextual
	4306	Precedence, ,Context-Dependent Precedence}.)
	4307
	4308	Finally, the resolution of conflicts works by comparing the precedence
	4309	of the rule being considered with that of the look-ahead token. If the
	4310	token's precedence is higher, the choice is to shift. If the rule's
	4311	precedence is higher, the choice is to reduce. If they have equal
	4312	precedence, the choice is made based on the associativity of that
	4313	precedence level. The verbose output file made by @samp{-v}
	4314	(@pxref{Invocation, ,Invoking Bison}) says how each conflict was
	4315	resolved.
	4316
	4317	Not all rules and not all tokens have precedence. If either the rule or
	4318	the look-ahead token has no precedence, then the default is to shift.
	4319
	4320	@node Contextual Precedence
	4321	@section Context-Dependent Precedence
	4322	@cindex context-dependent precedence
	4323	@cindex unary operator precedence
	4324	@cindex precedence, context-dependent
	4325	@cindex precedence, unary operator
	4326	@findex %prec
	4327
	4328	Often the precedence of an operator depends on the context. This sounds
	4329	outlandish at first, but it is really very common. For example, a minus
	4330	sign typically has a very high precedence as a unary operator, and a
	4331	somewhat lower precedence (lower than multiplication) as a binary operator.
	4332
	4333	The Bison precedence declarations, @code{%left}, @code{%right} and
	4334	@code{%nonassoc}, can only be used once for a given token; so a token has
	4335	only one precedence declared in this way. For context-dependent
	4336	precedence, you need to use an additional mechanism: the @code{%prec}
	4337	modifier for rules.
	4338
	4339	The @code{%prec} modifier declares the precedence of a particular rule by
	4340	specifying a terminal symbol whose precedence should be used for that rule.
	4341	It's not necessary for that symbol to appear otherwise in the rule. The
	4342	modifier's syntax is:
	4343
	4344	@example
	4345	%prec @var{terminal-symbol}
	4346	@end example
	4347
	4348	@noindent
	4349	and it is written after the components of the rule. Its effect is to
	4350	assign the rule the precedence of @var{terminal-symbol}, overriding
	4351	the precedence that would be deduced for it in the ordinary way. The
	4352	altered rule precedence then affects how conflicts involving that rule
	4353	are resolved (@pxref{Precedence, ,Operator Precedence}).
	4354
	4355	Here is how @code{%prec} solves the problem of unary minus. First, declare
	4356	a precedence for a fictitious terminal symbol named @code{UMINUS}. There
	4357	are no tokens of this type, but the symbol serves to stand for its
	4358	precedence:
	4359
	4360	@example
	4361	@dots{}
	4362	%left '+' '-'
	4363	%left '*'
	4364	%left UMINUS
	4365	@end example
	4366
	4367	Now the precedence of @code{UMINUS} can be used in specific rules:
	4368
	4369	@example
	4370	@group
	4371	exp: @dots{}
	4372	\| exp '-' exp
	4373	@dots{}
	4374	\| '-' exp %prec UMINUS
	4375	@end group
	4376	@end example
	4377
	4378	@node Parser States
	4379	@section Parser States
	4380	@cindex finite-state machine
	4381	@cindex parser state
	4382	@cindex state (of parser)
	4383
	4384	The function @code{yyparse} is implemented using a finite-state machine.
	4385	The values pushed on the parser stack are not simply token type codes; they
	4386	represent the entire sequence of terminal and nonterminal symbols at or
	4387	near the top of the stack. The current state collects all the information
	4388	about previous input which is relevant to deciding what to do next.
	4389
	4390	Each time a look-ahead token is read, the current parser state together
	4391	with the type of look-ahead token are looked up in a table. This table
	4392	entry can say, ``Shift the look-ahead token.'' In this case, it also
	4393	specifies the new parser state, which is pushed onto the top of the
	4394	parser stack. Or it can say, ``Reduce using rule number @var{n}.''
	4395	This means that a certain number of tokens or groupings are taken off
	4396	the top of the stack, and replaced by one grouping. In other words,
	4397	that number of states are popped from the stack, and one new state is
	4398	pushed.
	4399
	4400	There is one other alternative: the table can say that the look-ahead token
	4401	is erroneous in the current state. This causes error processing to begin
	4402	(@pxref{Error Recovery}).
	4403
	4404	@node Reduce/Reduce
	4405	@section Reduce/Reduce Conflicts
	4406	@cindex reduce/reduce conflict
	4407	@cindex conflicts, reduce/reduce
	4408
	4409	A reduce/reduce conflict occurs if there are two or more rules that apply
	4410	to the same sequence of input. This usually indicates a serious error
	4411	in the grammar.
	4412
	4413	For example, here is an erroneous attempt to define a sequence
	4414	of zero or more @code{word} groupings.
	4415
	4416	@example
	4417	sequence: /* empty */
	4418	@{ printf ("empty sequence\n"); @}
	4419	\| maybeword
	4420	\| sequence word
	4421	@{ printf ("added word %s\n", $2); @}
	4422	;
	4423
	4424	maybeword: /* empty */
	4425	@{ printf ("empty maybeword\n"); @}
	4426	\| word
	4427	@{ printf ("single word %s\n", $1); @}
	4428	;
	4429	@end example
	4430
	4431	@noindent
	4432	The error is an ambiguity: there is more than one way to parse a single
	4433	@code{word} into a @code{sequence}. It could be reduced to a
	4434	@code{maybeword} and then into a @code{sequence} via the second rule.
	4435	Alternatively, nothing-at-all could be reduced into a @code{sequence}
	4436	via the first rule, and this could be combined with the @code{word}
	4437	using the third rule for @code{sequence}.
	4438
	4439	There is also more than one way to reduce nothing-at-all into a
	4440	@code{sequence}. This can be done directly via the first rule,
	4441	or indirectly via @code{maybeword} and then the second rule.
	4442
	4443	You might think that this is a distinction without a difference, because it
	4444	does not change whether any particular input is valid or not. But it does
	4445	affect which actions are run. One parsing order runs the second rule's
	4446	action; the other runs the first rule's action and the third rule's action.
	4447	In this example, the output of the program changes.
	4448
	4449	Bison resolves a reduce/reduce conflict by choosing to use the rule that
	4450	appears first in the grammar, but it is very risky to rely on this. Every
	4451	reduce/reduce conflict must be studied and usually eliminated. Here is the
	4452	proper way to define @code{sequence}:
	4453
	4454	@example
	4455	sequence: /* empty */
	4456	@{ printf ("empty sequence\n"); @}
	4457	\| sequence word
	4458	@{ printf ("added word %s\n", $2); @}
	4459	;
	4460	@end example
	4461
	4462	Here is another common error that yields a reduce/reduce conflict:
	4463
	4464	@example
	4465	sequence: /* empty */
	4466	\| sequence words
	4467	\| sequence redirects
	4468	;
	4469
	4470	words: /* empty */
	4471	\| words word
	4472	;
	4473
	4474	redirects:/* empty */
	4475	\| redirects redirect
	4476	;
	4477	@end example
	4478
	4479	@noindent
	4480	The intention here is to define a sequence which can contain either
	4481	@code{word} or @code{redirect} groupings. The individual definitions of
	4482	@code{sequence}, @code{words} and @code{redirects} are error-free, but the
	4483	three together make a subtle ambiguity: even an empty input can be parsed
	4484	in infinitely many ways!
	4485
	4486	Consider: nothing-at-all could be a @code{words}. Or it could be two
	4487	@code{words} in a row, or three, or any number. It could equally well be a
	4488	@code{redirects}, or two, or any number. Or it could be a @code{words}
	4489	followed by three @code{redirects} and another @code{words}. And so on.
	4490
	4491	Here are two ways to correct these rules. First, to make it a single level
	4492	of sequence:
	4493
	4494	@example
	4495	sequence: /* empty */
	4496	\| sequence word
	4497	\| sequence redirect
	4498	;
	4499	@end example
	4500
	4501	Second, to prevent either a @code{words} or a @code{redirects}
	4502	from being empty:
	4503
	4504	@example
	4505	sequence: /* empty */
	4506	\| sequence words
	4507	\| sequence redirects
	4508	;
	4509
	4510	words: word
	4511	\| words word
	4512	;
	4513
	4514	redirects:redirect
	4515	\| redirects redirect
	4516	;
	4517	@end example
	4518
	4519	@node Mystery Conflicts
	4520	@section Mysterious Reduce/Reduce Conflicts
	4521
	4522	Sometimes reduce/reduce conflicts can occur that don't look warranted.
	4523	Here is an example:
	4524
	4525	@example
	4526	@group
	4527	%token ID
	4528
	4529	%%
	4530	def: param_spec return_spec ','
	4531	;
	4532	param_spec:
	4533	type
	4534	\| name_list ':' type
	4535	;
	4536	@end group
	4537	@group
	4538	return_spec:
	4539	type
	4540	\| name ':' type
	4541	;
	4542	@end group
	4543	@group
	4544	type: ID
	4545	;
	4546	@end group
	4547	@group
	4548	name: ID
	4549	;
	4550	name_list:
	4551	name
	4552	\| name ',' name_list
	4553	;
	4554	@end group
	4555	@end example
	4556
	4557	It would seem that this grammar can be parsed with only a single token
	4558	of look-ahead: when a @code{param_spec} is being read, an @code{ID} is
	4559	a @code{name} if a comma or colon follows, or a @code{type} if another
	4560	@code{ID} follows. In other words, this grammar is LR(1).
	4561
	4562	@cindex LR(1)
	4563	@cindex LALR(1)
	4564	However, Bison, like most parser generators, cannot actually handle all
	4565	LR(1) grammars. In this grammar, two contexts, that after an @code{ID}
	4566	at the beginning of a @code{param_spec} and likewise at the beginning of
	4567	a @code{return_spec}, are similar enough that Bison assumes they are the
	4568	same. They appear similar because the same set of rules would be
	4569	active---the rule for reducing to a @code{name} and that for reducing to
	4570	a @code{type}. Bison is unable to determine at that stage of processing
	4571	that the rules would require different look-ahead tokens in the two
	4572	contexts, so it makes a single parser state for them both. Combining
	4573	the two contexts causes a conflict later. In parser terminology, this
	4574	occurrence means that the grammar is not LALR(1).
	4575
	4576	In general, it is better to fix deficiencies than to document them. But
	4577	this particular deficiency is intrinsically hard to fix; parser
	4578	generators that can handle LR(1) grammars are hard to write and tend to
	4579	produce parsers that are very large. In practice, Bison is more useful
	4580	as it is now.
	4581
	4582	When the problem arises, you can often fix it by identifying the two
	4583	parser states that are being confused, and adding something to make them
	4584	look distinct. In the above example, adding one rule to
	4585	@code{return_spec} as follows makes the problem go away:
	4586
	4587	@example
	4588	@group
	4589	%token BOGUS
	4590	@dots{}
	4591	%%
	4592	@dots{}
	4593	return_spec:
	4594	type
	4595	\| name ':' type
	4596	/* This rule is never used. */
	4597	\| ID BOGUS
	4598	;
	4599	@end group
	4600	@end example
	4601
	4602	This corrects the problem because it introduces the possibility of an
	4603	additional active rule in the context after the @code{ID} at the beginning of
	4604	@code{return_spec}. This rule is not active in the corresponding context
	4605	in a @code{param_spec}, so the two contexts receive distinct parser states.
	4606	As long as the token @code{BOGUS} is never generated by @code{yylex},
	4607	the added rule cannot alter the way actual input is parsed.
	4608
	4609	In this particular example, there is another way to solve the problem:
	4610	rewrite the rule for @code{return_spec} to use @code{ID} directly
	4611	instead of via @code{name}. This also causes the two confusing
	4612	contexts to have different sets of active rules, because the one for
	4613	@code{return_spec} activates the altered rule for @code{return_spec}
	4614	rather than the one for @code{name}.
	4615
	4616	@example
	4617	param_spec:
	4618	type
	4619	\| name_list ':' type
	4620	;
	4621	return_spec:
	4622	type
	4623	\| ID ':' type
	4624	;
	4625	@end example
	4626
	4627	@node Stack Overflow
	4628	@section Stack Overflow, and How to Avoid It
	4629	@cindex stack overflow
	4630	@cindex parser stack overflow
	4631	@cindex overflow of parser stack
	4632
	4633	The Bison parser stack can overflow if too many tokens are shifted and
	4634	not reduced. When this happens, the parser function @code{yyparse}
	4635	returns a nonzero value, pausing only to call @code{yyerror} to report
	4636	the overflow.
	4637
	4638	@vindex YYMAXDEPTH
	4639	By defining the macro @code{YYMAXDEPTH}, you can control how deep the
	4640	parser stack can become before a stack overflow occurs. Define the
	4641	macro with a value that is an integer. This value is the maximum number
	4642	of tokens that can be shifted (and not reduced) before overflow.
	4643	It must be a constant expression whose value is known at compile time.
	4644
	4645	The stack space allowed is not necessarily allocated. If you specify a
	4646	large value for @code{YYMAXDEPTH}, the parser actually allocates a small
	4647	stack at first, and then makes it bigger by stages as needed. This
	4648	increasing allocation happens automatically and silently. Therefore,
	4649	you do not need to make @code{YYMAXDEPTH} painfully small merely to save
	4650	space for ordinary inputs that do not need much stack.
	4651
	4652	@cindex default stack limit
	4653	The default value of @code{YYMAXDEPTH}, if you do not define it, is
	4654	10000.
	4655
	4656	@vindex YYINITDEPTH
	4657	You can control how much stack is allocated initially by defining the
	4658	macro @code{YYINITDEPTH}. This value too must be a compile-time
	4659	constant integer. The default is 200.
	4660
	4661	@node Error Recovery
	4662	@chapter Error Recovery
	4663	@cindex error recovery
	4664	@cindex recovery from errors
	4665
	4666	It is not usually acceptable to have a program terminate on a parse
	4667	error. For example, a compiler should recover sufficiently to parse the
	4668	rest of the input file and check it for errors; a calculator should accept
	4669	another expression.
	4670
	4671	In a simple interactive command parser where each input is one line, it may
	4672	be sufficient to allow @code{yyparse} to return 1 on error and have the
	4673	caller ignore the rest of the input line when that happens (and then call
	4674	@code{yyparse} again). But this is inadequate for a compiler, because it
	4675	forgets all the syntactic context leading up to the error. A syntax error
	4676	deep within a function in the compiler input should not cause the compiler
	4677	to treat the following line like the beginning of a source file.
	4678
	4679	@findex error
	4680	You can define how to recover from a syntax error by writing rules to
	4681	recognize the special token @code{error}. This is a terminal symbol that
	4682	is always defined (you need not declare it) and reserved for error
	4683	handling. The Bison parser generates an @code{error} token whenever a
	4684	syntax error happens; if you have provided a rule to recognize this token
	4685	in the current context, the parse can continue.
	4686
	4687	For example:
	4688
	4689	@example
	4690	stmnts: /* empty string */
	4691	\| stmnts '\n'
	4692	\| stmnts exp '\n'
	4693	\| stmnts error '\n'
	4694	@end example
	4695
	4696	The fourth rule in this example says that an error followed by a newline
	4697	makes a valid addition to any @code{stmnts}.
	4698
	4699	What happens if a syntax error occurs in the middle of an @code{exp}? The
	4700	error recovery rule, interpreted strictly, applies to the precise sequence
	4701	of a @code{stmnts}, an @code{error} and a newline. If an error occurs in
	4702	the middle of an @code{exp}, there will probably be some additional tokens
	4703	and subexpressions on the stack after the last @code{stmnts}, and there
	4704	will be tokens to read before the next newline. So the rule is not
	4705	applicable in the ordinary way.
	4706
	4707	But Bison can force the situation to fit the rule, by discarding part of
	4708	the semantic context and part of the input. First it discards states and
	4709	objects from the stack until it gets back to a state in which the
	4710	@code{error} token is acceptable. (This means that the subexpressions
	4711	already parsed are discarded, back to the last complete @code{stmnts}.) At
	4712	this point the @code{error} token can be shifted. Then, if the old
	4713	look-ahead token is not acceptable to be shifted next, the parser reads
	4714	tokens and discards them until it finds a token which is acceptable. In
	4715	this example, Bison reads and discards input until the next newline
	4716	so that the fourth rule can apply.
	4717
	4718	The choice of error rules in the grammar is a choice of strategies for
	4719	error recovery. A simple and useful strategy is simply to skip the rest of
	4720	the current input line or current statement if an error is detected:
	4721
	4722	@example
	4723	stmnt: error ';' /* on error, skip until ';' is read */
	4724	@end example
	4725
	4726	It is also useful to recover to the matching close-delimiter of an
	4727	opening-delimiter that has already been parsed. Otherwise the
	4728	close-delimiter will probably appear to be unmatched, and generate another,
	4729	spurious error message:
	4730
	4731	@example
	4732	primary: '(' expr ')'
	4733	\| '(' error ')'
	4734	@dots{}
	4735	;
	4736	@end example
	4737
	4738	Error recovery strategies are necessarily guesses. When they guess wrong,
	4739	one syntax error often leads to another. In the above example, the error
	4740	recovery rule guesses that an error is due to bad input within one
	4741	@code{stmnt}. Suppose that instead a spurious semicolon is inserted in the
	4742	middle of a valid @code{stmnt}. After the error recovery rule recovers
	4743	from the first error, another syntax error will be found straightaway,
	4744	since the text following the spurious semicolon is also an invalid
	4745	@code{stmnt}.
	4746
	4747	To prevent an outpouring of error messages, the parser will output no error
	4748	message for another syntax error that happens shortly after the first; only
	4749	after three consecutive input tokens have been successfully shifted will
	4750	error messages resume.
	4751
	4752	Note that rules which accept the @code{error} token may have actions, just
	4753	as any other rules can.
	4754
	4755	@findex yyerrok
	4756	You can make error messages resume immediately by using the macro
	4757	@code{yyerrok} in an action. If you do this in the error rule's action, no
	4758	error messages will be suppressed. This macro requires no arguments;
	4759	@samp{yyerrok;} is a valid C statement.
	4760
	4761	@findex yyclearin
	4762	The previous look-ahead token is reanalyzed immediately after an error. If
	4763	this is unacceptable, then the macro @code{yyclearin} may be used to clear
	4764	this token. Write the statement @samp{yyclearin;} in the error rule's
	4765	action.
	4766
	4767	For example, suppose that on a parse error, an error handling routine is
	4768	called that advances the input stream to some point where parsing should
	4769	once again commence. The next symbol returned by the lexical scanner is
	4770	probably correct. The previous look-ahead token ought to be discarded
	4771	with @samp{yyclearin;}.
	4772
	4773	@vindex YYRECOVERING
	4774	The macro @code{YYRECOVERING} stands for an expression that has the
	4775	value 1 when the parser is recovering from a syntax error, and 0 the
	4776	rest of the time. A value of 1 indicates that error messages are
	4777	currently suppressed for new syntax errors.
	4778
	4779	@node Context Dependency
	4780	@chapter Handling Context Dependencies
	4781
	4782	The Bison paradigm is to parse tokens first, then group them into larger
	4783	syntactic units. In many languages, the meaning of a token is affected by
	4784	its context. Although this violates the Bison paradigm, certain techniques
	4785	(known as @dfn{kludges}) may enable you to write Bison parsers for such
	4786	languages.
	4787
	4788	@menu
	4789	* Semantic Tokens:: Token parsing can depend on the semantic context.
	4790	* Lexical Tie-ins:: Token parsing can depend on the syntactic context.
	4791	* Tie-in Recovery:: Lexical tie-ins have implications for how
	4792	error recovery rules must be written.
	4793	@end menu
	4794
	4795	(Actually, ``kludge'' means any technique that gets its job done but is
	4796	neither clean nor robust.)
	4797
	4798	@node Semantic Tokens
	4799	@section Semantic Info in Token Types
	4800
	4801	The C language has a context dependency: the way an identifier is used
	4802	depends on what its current meaning is. For example, consider this:
	4803
	4804	@example
	4805	foo (x);
	4806	@end example
	4807
	4808	This looks like a function call statement, but if @code{foo} is a typedef
	4809	name, then this is actually a declaration of @code{x}. How can a Bison
	4810	parser for C decide how to parse this input?
	4811
	4812	The method used in GNU C is to have two different token types,
	4813	@code{IDENTIFIER} and @code{TYPENAME}. When @code{yylex} finds an
	4814	identifier, it looks up the current declaration of the identifier in order
	4815	to decide which token type to return: @code{TYPENAME} if the identifier is
	4816	declared as a typedef, @code{IDENTIFIER} otherwise.
	4817
	4818	The grammar rules can then express the context dependency by the choice of
	4819	token type to recognize. @code{IDENTIFIER} is accepted as an expression,
	4820	but @code{TYPENAME} is not. @code{TYPENAME} can start a declaration, but
	4821	@code{IDENTIFIER} cannot. In contexts where the meaning of the identifier
	4822	is @emph{not} significant, such as in declarations that can shadow a
	4823	typedef name, either @code{TYPENAME} or @code{IDENTIFIER} is
	4824	accepted---there is one rule for each of the two token types.
	4825
	4826	This technique is simple to use if the decision of which kinds of
	4827	identifiers to allow is made at a place close to where the identifier is
	4828	parsed. But in C this is not always so: C allows a declaration to
	4829	redeclare a typedef name provided an explicit type has been specified
	4830	earlier:
	4831
	4832	@example
	4833	typedef int foo, bar, lose;
	4834	static foo (bar); /* @r{redeclare @code{bar} as static variable} */
	4835	static int foo (lose); /* @r{redeclare @code{foo} as function} */
	4836	@end example
	4837
	4838	Unfortunately, the name being declared is separated from the declaration
	4839	construct itself by a complicated syntactic structure---the ``declarator''.
	4840
	4841	As a result, part of the Bison parser for C needs to be duplicated, with
	4842	all the nonterminal names changed: once for parsing a declaration in
	4843	which a typedef name can be redefined, and once for parsing a
	4844	declaration in which that can't be done. Here is a part of the
	4845	duplication, with actions omitted for brevity:
	4846
	4847	@example
	4848	initdcl:
	4849	declarator maybeasm '='
	4850	init
	4851	\| declarator maybeasm
	4852	;
	4853
	4854	notype_initdcl:
	4855	notype_declarator maybeasm '='
	4856	init
	4857	\| notype_declarator maybeasm
	4858	;
	4859	@end example
	4860
	4861	@noindent
	4862	Here @code{initdcl} can redeclare a typedef name, but @code{notype_initdcl}
	4863	cannot. The distinction between @code{declarator} and
	4864	@code{notype_declarator} is the same sort of thing.
	4865
	4866	There is some similarity between this technique and a lexical tie-in
	4867	(described next), in that information which alters the lexical analysis is
	4868	changed during parsing by other parts of the program. The difference is
	4869	here the information is global, and is used for other purposes in the
	4870	program. A true lexical tie-in has a special-purpose flag controlled by
	4871	the syntactic context.
	4872
	4873	@node Lexical Tie-ins
	4874	@section Lexical Tie-ins
	4875	@cindex lexical tie-in
	4876
	4877	One way to handle context-dependency is the @dfn{lexical tie-in}: a flag
	4878	which is set by Bison actions, whose purpose is to alter the way tokens are
	4879	parsed.
	4880
	4881	For example, suppose we have a language vaguely like C, but with a special
	4882	construct @samp{hex (@var{hex-expr})}. After the keyword @code{hex} comes
	4883	an expression in parentheses in which all integers are hexadecimal. In
	4884	particular, the token @samp{a1b} must be treated as an integer rather than
	4885	as an identifier if it appears in that context. Here is how you can do it:
	4886
	4887	@example
	4888	@group
	4889	%@{
	4890	int hexflag;
	4891	%@}
	4892	%%
	4893	@dots{}
	4894	@end group
	4895	@group
	4896	expr: IDENTIFIER
	4897	\| constant
	4898	\| HEX '('
	4899	@{ hexflag = 1; @}
	4900	expr ')'
	4901	@{ hexflag = 0;
	4902	$$ = $4; @}
	4903	\| expr '+' expr
	4904	@{ $$ = make_sum ($1, $3); @}
	4905	@dots{}
	4906	;
	4907	@end group
	4908
	4909	@group
	4910	constant:
	4911	INTEGER
	4912	\| STRING
	4913	;
	4914	@end group
	4915	@end example
	4916
	4917	@noindent
	4918	Here we assume that @code{yylex} looks at the value of @code{hexflag}; when
	4919	it is nonzero, all integers are parsed in hexadecimal, and tokens starting
	4920	with letters are parsed as integers if possible.
	4921
	4922	The declaration of @code{hexflag} shown in the prologue of the parser file
	4923	is needed to make it accessible to the actions (@pxref{Prologue, ,The Prologue}).
	4924	You must also write the code in @code{yylex} to obey the flag.
	4925
	4926	@node Tie-in Recovery
	4927	@section Lexical Tie-ins and Error Recovery
	4928
	4929	Lexical tie-ins make strict demands on any error recovery rules you have.
	4930	@xref{Error Recovery}.
	4931
	4932	The reason for this is that the purpose of an error recovery rule is to
	4933	abort the parsing of one construct and resume in some larger construct.
	4934	For example, in C-like languages, a typical error recovery rule is to skip
	4935	tokens until the next semicolon, and then start a new statement, like this:
	4936
	4937	@example
	4938	stmt: expr ';'
	4939	\| IF '(' expr ')' stmt @{ @dots{} @}
	4940	@dots{}
	4941	error ';'
	4942	@{ hexflag = 0; @}
	4943	;
	4944	@end example
	4945
	4946	If there is a syntax error in the middle of a @samp{hex (@var{expr})}
	4947	construct, this error rule will apply, and then the action for the
	4948	completed @samp{hex (@var{expr})} will never run. So @code{hexflag} would
	4949	remain set for the entire rest of the input, or until the next @code{hex}
	4950	keyword, causing identifiers to be misinterpreted as integers.
	4951
	4952	To avoid this problem the error recovery rule itself clears @code{hexflag}.
	4953
	4954	There may also be an error recovery rule that works within expressions.
	4955	For example, there could be a rule which applies within parentheses
	4956	and skips to the close-parenthesis:
	4957
	4958	@example
	4959	@group
	4960	expr: @dots{}
	4961	\| '(' expr ')'
	4962	@{ $$ = $2; @}
	4963	\| '(' error ')'
	4964	@dots{}
	4965	@end group
	4966	@end example
	4967
	4968	If this rule acts within the @code{hex} construct, it is not going to abort
	4969	that construct (since it applies to an inner level of parentheses within
	4970	the construct). Therefore, it should not clear the flag: the rest of
	4971	the @code{hex} construct should be parsed with the flag still in effect.
	4972
	4973	What if there is an error recovery rule which might abort out of the
	4974	@code{hex} construct or might not, depending on circumstances? There is no
	4975	way you can write the action to determine whether a @code{hex} construct is
	4976	being aborted or not. So if you are using a lexical tie-in, you had better
	4977	make sure your error recovery rules are not of this kind. Each rule must
	4978	be such that you can be sure that it always will, or always won't, have to
	4979	clear the flag.
	4980
	4981	@c ================================================== Debugging Your Parser
	4982
	4983	@node Debugging
	4984	@chapter Debugging Your Parser
	4985
	4986	Developing a parser can be a challenge, especially if you don't
	4987	understand the algorithm (@pxref{Algorithm, ,The Bison Parser
	4988	Algorithm}). Even so, sometimes a detailed description of the automaton
	4989	can help (@pxref{Understanding, , Understanding Your Parser}), or
	4990	tracing the execution of the parser can give some insight on why it
	4991	behaves improperly (@pxref{Tracing, , Tracing Your Parser}).
	4992
	4993	@menu
	4994	* Understanding:: Understanding the structure of your parser.
	4995	* Tracing:: Tracing the execution of your parser.
	4996	@end menu
	4997
	4998	@node Understanding
	4999	@section Understanding Your Parser
	5000
	5001	As documented elsewhere (@pxref{Algorithm, ,The Bison Parser Algorithm})
	5002	Bison parsers are @dfn{shift/reduce automata}. In some cases (much more
	5003	frequent than one would hope), looking at this automaton is required to
	5004	tune or simply fix a parser. Bison provides two different
	5005	representation of it, either textually or graphically (as a @sc{vcg}
	5006	file).
	5007
	5008	The textual file is generated when the options @option{--report} or
	5009	@option{--verbose} are specified, see @xref{Invocation, , Invoking
	5010	Bison}. Its name is made by removing @samp{.tab.c} or @samp{.c} from
	5011	the parser output file name, and adding @samp{.output} instead.
	5012	Therefore, if the input file is @file{foo.y}, then the parser file is
	5013	called @file{foo.tab.c} by default. As a consequence, the verbose
	5014	output file is called @file{foo.output}.
	5015
	5016	The following grammar file, @file{calc.y}, will be used in the sequel:
	5017
	5018	@example
	5019	%token NUM STR
	5020	%left '+' '-'
	5021	%left '*'
	5022	%%
	5023	exp: exp '+' exp
	5024	\| exp '-' exp
	5025	\| exp '*' exp
	5026	\| exp '/' exp
	5027	\| NUM
	5028	;
	5029	useless: STR;
	5030	%%
	5031	@end example
	5032
	5033	@command{bison} reports that @samp{calc.y contains 1 useless nonterminal
	5034	and 1 useless rule} and that @samp{calc.y contains 7 shift/reduce
	5035	conflicts}. When given @option{--report=state}, in addition to
	5036	@file{calc.tab.c}, it creates a file @file{calc.output} with contents
	5037	detailed below. The order of the output and the exact presentation
	5038	might vary, but the interpretation is the same.
	5039
	5040	The first section includes details on conflicts that were solved thanks
	5041	to precedence and/or associativity:
	5042
	5043	@example
	5044	Conflict in state 8 between rule 2 and token '+' resolved as reduce.
	5045	Conflict in state 8 between rule 2 and token '-' resolved as reduce.
	5046	Conflict in state 8 between rule 2 and token '*' resolved as shift.
	5047	@exdent @dots{}
	5048	@end example
	5049
	5050	@noindent
	5051	The next section lists states that still have conflicts.
	5052
	5053	@example
	5054	State 8 contains 1 shift/reduce conflict.
	5055	State 9 contains 1 shift/reduce conflict.
	5056	State 10 contains 1 shift/reduce conflict.
	5057	State 11 contains 4 shift/reduce conflicts.
	5058	@end example
	5059
	5060	@noindent
	5061	@cindex token, useless
	5062	@cindex useless token
	5063	@cindex nonterminal, useless
	5064	@cindex useless nonterminal
	5065	@cindex rule, useless
	5066	@cindex useless rule
	5067	The next section reports useless tokens, nonterminal and rules. Useless
	5068	nonterminals and rules are removed in order to produce a smaller parser,
	5069	but useless tokens are preserved, since they might be used by the
	5070	scanner (note the difference between ``useless'' and ``not used''
	5071	below):
	5072
	5073	@example
	5074	Useless nonterminals:
	5075	useless
	5076
	5077	Terminals which are not used:
	5078	STR
	5079
	5080	Useless rules:
	5081	#6 useless: STR;
	5082	@end example
	5083
	5084	@noindent
	5085	The next section reproduces the exact grammar that Bison used:
	5086
	5087	@example
	5088	Grammar
	5089
	5090	Number, Line, Rule
	5091	0 5 $axiom -> exp $
	5092	1 5 exp -> exp '+' exp
	5093	2 6 exp -> exp '-' exp
	5094	3 7 exp -> exp '*' exp
	5095	4 8 exp -> exp '/' exp
	5096	5 9 exp -> NUM
	5097	@end example
	5098
	5099	@noindent
	5100	and reports the uses of the symbols:
	5101
	5102	@example
	5103	Terminals, with rules where they appear
	5104
	5105	$ (0) 0
	5106	'*' (42) 3
	5107	'+' (43) 1
	5108	'-' (45) 2
	5109	'/' (47) 4
	5110	error (256)
	5111	NUM (258) 5
	5112
	5113	Nonterminals, with rules where they appear
	5114
	5115	$axiom (8)
	5116	on left: 0
	5117	exp (9)
	5118	on left: 1 2 3 4 5, on right: 0 1 2 3 4
	5119	@end example
	5120
	5121	@noindent
	5122	@cindex item
	5123	@cindex pointed rule
	5124	@cindex rule, pointed
	5125	Bison then proceeds onto the automaton itself, describing each state
	5126	with it set of @dfn{items}, also known as @dfn{pointed rules}. Each
	5127	item is a production rule together with a point (marked by @samp{.})
	5128	that the input cursor.
	5129
	5130	@example
	5131	state 0
	5132
	5133	$axiom -> . exp $ (rule 0)
	5134
	5135	NUM shift, and go to state 1
	5136
	5137	exp go to state 2
	5138	@end example
	5139
	5140	This reads as follows: ``state 0 corresponds to being at the very
	5141	beginning of the parsing, in the initial rule, right before the start
	5142	symbol (here, @code{exp}). When the parser returns to this state right
	5143	after having reduced a rule that produced an @code{exp}, the control
	5144	flow jumps to state 2. If there is no such transition on a nonterminal
	5145	symbol, and the lookahead is a @code{NUM}, then this token is shifted on
	5146	the parse stack, and the control flow jumps to state 1. Any other
	5147	lookahead triggers a parse error.''
	5148
	5149	@cindex core, item set
	5150	@cindex item set core
	5151	@cindex kernel, item set
	5152	@cindex item set core
	5153	Even though the only active rule in state 0 seems to be rule 0, the
	5154	report lists @code{NUM} as a lookahead symbol because @code{NUM} can be
	5155	at the beginning of any rule deriving an @code{exp}. By default Bison
	5156	reports the so-called @dfn{core} or @dfn{kernel} of the item set, but if
	5157	you want to see more detail you can invoke @command{bison} with
	5158	@option{--report=itemset} to list all the items, include those that can
	5159	be derived:
	5160
	5161	@example
	5162	state 0
	5163
	5164	$axiom -> . exp $ (rule 0)
	5165	exp -> . exp '+' exp (rule 1)
	5166	exp -> . exp '-' exp (rule 2)
	5167	exp -> . exp '*' exp (rule 3)
	5168	exp -> . exp '/' exp (rule 4)
	5169	exp -> . NUM (rule 5)
	5170
	5171	NUM shift, and go to state 1
	5172
	5173	exp go to state 2
	5174	@end example
	5175
	5176	@noindent
	5177	In the state 1...
	5178
	5179	@example
	5180	state 1
	5181
	5182	exp -> NUM . (rule 5)
	5183
	5184	$default reduce using rule 5 (exp)
	5185	@end example
	5186
	5187	@noindent
	5188	the rule 5, @samp{exp: NUM;}, is completed. Whatever the lookahead
	5189	(@samp{$default}), the parser will reduce it. If it was coming from
	5190	state 0, then, after this reduction it will return to state 0, and will
	5191	jump to state 2 (@samp{exp: go to state 2}).
	5192
	5193	@example
	5194	state 2
	5195
	5196	$axiom -> exp . $ (rule 0)
	5197	exp -> exp . '+' exp (rule 1)
	5198	exp -> exp . '-' exp (rule 2)
	5199	exp -> exp . '*' exp (rule 3)
	5200	exp -> exp . '/' exp (rule 4)
	5201
	5202	$ shift, and go to state 3
	5203	'+' shift, and go to state 4
	5204	'-' shift, and go to state 5
	5205	'*' shift, and go to state 6
	5206	'/' shift, and go to state 7
	5207	@end example
	5208
	5209	@noindent
	5210	In state 2, the automaton can only shift a symbol. For instance,
	5211	because of the item @samp{exp -> exp . '+' exp}, if the lookahead if
	5212	@samp{+}, it will be shifted on the parse stack, and the automaton
	5213	control will jump to state 4, corresponding to the item @samp{exp -> exp
	5214	'+' . exp}. Since there is no default action, any other token than
	5215	those listed above will trigger a parse error.
	5216
	5217	The state 3 is named the @dfn{final state}, or the @dfn{accepting
	5218	state}:
	5219
	5220	@example
	5221	state 3
	5222
	5223	$axiom -> exp $ . (rule 0)
	5224
	5225	$default accept
	5226	@end example
	5227
	5228	@noindent
	5229	the initial rule is completed (the start symbol and the end
	5230	of input were read), the parsing exits successfully.
	5231
	5232	The interpretation of states 4 to 7 is straightforward, and is left to
	5233	the reader.
	5234
	5235	@example
	5236	state 4
	5237
	5238	exp -> exp '+' . exp (rule 1)
	5239
	5240	NUM shift, and go to state 1
	5241
	5242	exp go to state 8
	5243
	5244	state 5
	5245
	5246	exp -> exp '-' . exp (rule 2)
	5247
	5248	NUM shift, and go to state 1
	5249
	5250	exp go to state 9
	5251
	5252	state 6
	5253
	5254	exp -> exp '*' . exp (rule 3)
	5255
	5256	NUM shift, and go to state 1
	5257
	5258	exp go to state 10
	5259
	5260	state 7
	5261
	5262	exp -> exp '/' . exp (rule 4)
	5263
	5264	NUM shift, and go to state 1
	5265
	5266	exp go to state 11
	5267	@end example
	5268
	5269	As was announced in beginning of the report, @samp{State 8 contains 1
	5270	shift/reduce conflict}:
	5271
	5272	@example
	5273	state 8
	5274
	5275	exp -> exp . '+' exp (rule 1)
	5276	exp -> exp '+' exp . (rule 1)
	5277	exp -> exp . '-' exp (rule 2)
	5278	exp -> exp . '*' exp (rule 3)
	5279	exp -> exp . '/' exp (rule 4)
	5280
	5281	'*' shift, and go to state 6
	5282	'/' shift, and go to state 7
	5283
	5284	'/' [reduce using rule 1 (exp)]
	5285	$default reduce using rule 1 (exp)
	5286	@end example
	5287
	5288	Indeed, there are two actions associated to the lookahead @samp{/}:
	5289	either shifting (and going to state 7), or reducing rule 1. The
	5290	conflict means that either the grammar is ambiguous, or the parser lacks
	5291	information to make the right decision. Indeed the grammar is
	5292	ambiguous, as, since we did not specify the precedence of @samp{/}, the
	5293	sentence @samp{NUM + NUM / NUM} can be parsed as @samp{NUM + (NUM /
	5294	NUM)}, which corresponds to shifting @samp{/}, or as @samp{(NUM + NUM) /
	5295	NUM}, which corresponds to reducing rule 1.
	5296
	5297	Because in LALR(1) parsing a single decision can be made, Bison
	5298	arbitrarily chose to disable the reduction, see @ref{Shift/Reduce, ,
	5299	Shift/Reduce Conflicts}. Discarded actions are reported in between
	5300	square brackets.
	5301
	5302	Note that all the previous states had a single possible action: either
	5303	shifting the next token and going to the corresponding state, or
	5304	reducing a single rule. In the other cases, i.e., when shifting
	5305	@emph{and} reducing is possible or when @emph{several} reductions are
	5306	possible, the lookahead is required to select the action. State 8 is
	5307	one such state: if the lookahead is @samp{*} or @samp{/} then the action
	5308	is shifting, otherwise the action is reducing rule 1. In other words,
	5309	the first two items, corresponding to rule 1, are not eligible when the
	5310	lookahead is @samp{}, since we specified that @samp{} has higher
	5311	precedence that @samp{+}. More generally, some items are eligible only
	5312	with some set of possible lookaheads. When run with
	5313	@option{--report=lookahead}, Bison specifies these lookaheads:
	5314
	5315	@example
	5316	state 8
	5317
	5318	exp -> exp . '+' exp [$, '+', '-', '/'] (rule 1)
	5319	exp -> exp '+' exp . [$, '+', '-', '/'] (rule 1)
	5320	exp -> exp . '-' exp (rule 2)
	5321	exp -> exp . '*' exp (rule 3)
	5322	exp -> exp . '/' exp (rule 4)
	5323
	5324	'*' shift, and go to state 6
	5325	'/' shift, and go to state 7
	5326
	5327	'/' [reduce using rule 1 (exp)]
	5328	$default reduce using rule 1 (exp)
	5329	@end example
	5330
	5331	The remaining states are similar:
	5332
	5333	@example
	5334	state 9
	5335
	5336	exp -> exp . '+' exp (rule 1)
	5337	exp -> exp . '-' exp (rule 2)
	5338	exp -> exp '-' exp . (rule 2)
	5339	exp -> exp . '*' exp (rule 3)
	5340	exp -> exp . '/' exp (rule 4)
	5341
	5342	'*' shift, and go to state 6
	5343	'/' shift, and go to state 7
	5344
	5345	'/' [reduce using rule 2 (exp)]
	5346	$default reduce using rule 2 (exp)
	5347
	5348	state 10
	5349
	5350	exp -> exp . '+' exp (rule 1)
	5351	exp -> exp . '-' exp (rule 2)
	5352	exp -> exp . '*' exp (rule 3)
	5353	exp -> exp '*' exp . (rule 3)
	5354	exp -> exp . '/' exp (rule 4)
	5355
	5356	'/' shift, and go to state 7
	5357
	5358	'/' [reduce using rule 3 (exp)]
	5359	$default reduce using rule 3 (exp)
	5360
	5361	state 11
	5362
	5363	exp -> exp . '+' exp (rule 1)
	5364	exp -> exp . '-' exp (rule 2)
	5365	exp -> exp . '*' exp (rule 3)
	5366	exp -> exp . '/' exp (rule 4)
	5367	exp -> exp '/' exp . (rule 4)
	5368
	5369	'+' shift, and go to state 4
	5370	'-' shift, and go to state 5
	5371	'*' shift, and go to state 6
	5372	'/' shift, and go to state 7
	5373
	5374	'+' [reduce using rule 4 (exp)]
	5375	'-' [reduce using rule 4 (exp)]
	5376	'*' [reduce using rule 4 (exp)]
	5377	'/' [reduce using rule 4 (exp)]
	5378	$default reduce using rule 4 (exp)
	5379	@end example
	5380
	5381	@noindent
	5382	Observe that state 11 contains conflicts due to the lack of precedence
	5383	of @samp{/} wrt @samp{+}, @samp{-}, and @samp{*}, but also because the
	5384	associativity of @samp{/} is not specified.
	5385
	5386
	5387	@node Tracing
	5388	@section Tracing Your Parser
	5389	@findex yydebug
	5390	@cindex debugging
	5391	@cindex tracing the parser
	5392
	5393	If a Bison grammar compiles properly but doesn't do what you want when it
	5394	runs, the @code{yydebug} parser-trace feature can help you figure out why.
	5395
	5396	There are several means to enable compilation of trace facilities:
	5397
	5398	@table @asis
	5399	@item the macro @code{YYDEBUG}
	5400	@findex YYDEBUG
	5401	Define the macro @code{YYDEBUG} to a nonzero value when you compile the
	5402	parser. This is compliant with POSIX Yacc. You could use
	5403	@samp{-DYYDEBUG=1} as a compiler option or you could put @samp{#define
	5404	YYDEBUG 1} in the prologue of the grammar file (@pxref{Prologue, , The
	5405	Prologue}).
	5406
	5407	@item the option @option{-t}, @option{--debug}
	5408	Use the @samp{-t} option when you run Bison (@pxref{Invocation,
	5409	,Invoking Bison}). This is POSIX compliant too.
	5410
	5411	@item the directive @samp{%debug}
	5412	@findex %debug
	5413	Add the @code{%debug} directive (@pxref{Decl Summary, ,Bison
	5414	Declaration Summary}). This is a Bison extension, which will prove
	5415	useful when Bison will output parsers for languages that don't use a
	5416	preprocessor. Useless POSIX and Yacc portability matter to you, this is
	5417	the preferred solution.
	5418	@end table
	5419
	5420	We suggest that you always enable the debug option so that debugging is
	5421	always possible.
	5422
	5423	The trace facility outputs messages with macro calls of the form
	5424	@code{YYFPRINTF (stderr, @var{format}, @var{args})} where
	5425	@var{format} and @var{args} are the usual @code{printf} format and
	5426	arguments. If you define @code{YYDEBUG} to a nonzero value but do not
	5427	define @code{YYFPRINTF}, @code{<stdio.h>} is automatically included
	5428	and @code{YYPRINTF} is defined to @code{fprintf}.
	5429
	5430	Once you have compiled the program with trace facilities, the way to
	5431	request a trace is to store a nonzero value in the variable @code{yydebug}.
	5432	You can do this by making the C code do it (in @code{main}, perhaps), or
	5433	you can alter the value with a C debugger.
	5434
	5435	Each step taken by the parser when @code{yydebug} is nonzero produces a
	5436	line or two of trace information, written on @code{stderr}. The trace
	5437	messages tell you these things:
	5438
	5439	@itemize @bullet
	5440	@item
	5441	Each time the parser calls @code{yylex}, what kind of token was read.
	5442
	5443	@item
	5444	Each time a token is shifted, the depth and complete contents of the
	5445	state stack (@pxref{Parser States}).
	5446
	5447	@item
	5448	Each time a rule is reduced, which rule it is, and the complete contents
	5449	of the state stack afterward.
	5450	@end itemize
	5451
	5452	To make sense of this information, it helps to refer to the listing file
	5453	produced by the Bison @samp{-v} option (@pxref{Invocation, ,Invoking
	5454	Bison}). This file shows the meaning of each state in terms of
	5455	positions in various rules, and also what each state will do with each
	5456	possible input token. As you read the successive trace messages, you
	5457	can see that the parser is functioning according to its specification in
	5458	the listing file. Eventually you will arrive at the place where
	5459	something undesirable happens, and you will see which parts of the
	5460	grammar are to blame.
	5461
	5462	The parser file is a C program and you can use C debuggers on it, but it's
	5463	not easy to interpret what it is doing. The parser function is a
	5464	finite-state machine interpreter, and aside from the actions it executes
	5465	the same code over and over. Only the values of variables show where in
	5466	the grammar it is working.
	5467
	5468	@findex YYPRINT
	5469	The debugging information normally gives the token type of each token
	5470	read, but not its semantic value. You can optionally define a macro
	5471	named @code{YYPRINT} to provide a way to print the value. If you define
	5472	@code{YYPRINT}, it should take three arguments. The parser will pass a
	5473	standard I/O stream, the numeric code for the token type, and the token
	5474	value (from @code{yylval}).
	5475
	5476	Here is an example of @code{YYPRINT} suitable for the multi-function
	5477	calculator (@pxref{Mfcalc Decl, ,Declarations for @code{mfcalc}}):
	5478
	5479	@smallexample
	5480	#define YYPRINT(file, type, value) yyprint (file, type, value)
	5481
	5482	static void
	5483	yyprint (FILE *file, int type, YYSTYPE value)
	5484	@{
	5485	if (type == VAR)
	5486	fprintf (file, " %s", value.tptr->name);
	5487	else if (type == NUM)
	5488	fprintf (file, " %d", value.val);
	5489	@}
	5490	@end smallexample
	5491
	5492	@c ================================================= Invoking Bison
	5493
	5494	@node Invocation
	5495	@chapter Invoking Bison
	5496	@cindex invoking Bison
	5497	@cindex Bison invocation
	5498	@cindex options for invoking Bison
	5499
	5500	The usual way to invoke Bison is as follows:
	5501
	5502	@example
	5503	bison @var{infile}
	5504	@end example
	5505
	5506	Here @var{infile} is the grammar file name, which usually ends in
	5507	@samp{.y}. The parser file's name is made by replacing the @samp{.y}
	5508	with @samp{.tab.c}. Thus, the @samp{bison foo.y} filename yields
	5509	@file{foo.tab.c}, and the @samp{bison hack/foo.y} filename yields
	5510	@file{hack/foo.tab.c}. It's is also possible, in case you are writing
	5511	C++ code instead of C in your grammar file, to name it @file{foo.ypp}
	5512	or @file{foo.y++}. Then, the output files will take an extention like
	5513	the given one as input (repectively @file{foo.tab.cpp} and @file{foo.tab.c++}).
	5514	This feature takes effect with all options that manipulate filenames like
	5515	@samp{-o} or @samp{-d}.
	5516
	5517	For example :
	5518
	5519	@example
	5520	bison -d @var{infile.yxx}
	5521	@end example
	5522	@noindent
	5523	will produce @file{infile.tab.cxx} and @file{infile.tab.hxx}. and
	5524
	5525	@example
	5526	bison -d @var{infile.y} -o @var{output.c++}
	5527	@end example
	5528	@noindent
	5529	will produce @file{output.c++} and @file{outfile.h++}.
	5530
	5531
	5532	@menu
	5533	* Bison Options:: All the options described in detail,
	5534	in alphabetical order by short options.
	5535	* Environment Variables:: Variables which affect Bison execution.
	5536	* Option Cross Key:: Alphabetical list of long options.
	5537	* VMS Invocation:: Bison command syntax on VMS.
	5538	@end menu
	5539
	5540	@node Bison Options
	5541	@section Bison Options
	5542
	5543	Bison supports both traditional single-letter options and mnemonic long
	5544	option names. Long option names are indicated with @samp{--} instead of
	5545	@samp{-}. Abbreviations for option names are allowed as long as they
	5546	are unique. When a long option takes an argument, like
	5547	@samp{--file-prefix}, connect the option name and the argument with
	5548	@samp{=}.
	5549
	5550	Here is a list of options that can be used with Bison, alphabetized by
	5551	short option. It is followed by a cross key alphabetized by long
	5552	option.
	5553
	5554	@c Please, keep this ordered as in `bison --help'.
	5555	@noindent
	5556	Operations modes:
	5557	@table @option
	5558	@item -h
	5559	@itemx --help
	5560	Print a summary of the command-line options to Bison and exit.
	5561
	5562	@item -V
	5563	@itemx --version
	5564	Print the version number of Bison and exit.
	5565
	5566	@need 1750
	5567	@item -y
	5568	@itemx --yacc
	5569	Equivalent to @samp{-o y.tab.c}; the parser output file is called
	5570	@file{y.tab.c}, and the other outputs are called @file{y.output} and
	5571	@file{y.tab.h}. The purpose of this option is to imitate Yacc's output
	5572	file name conventions. Thus, the following shell script can substitute
	5573	for Yacc:
	5574
	5575	@example
	5576	bison -y $*
	5577	@end example
	5578	@end table
	5579
	5580	@noindent
	5581	Tuning the parser:
	5582
	5583	@table @option
	5584	@item -S @var{file}
	5585	@itemx --skeleton=@var{file}
	5586	Specify the skeleton to use. You probably don't need this option unless
	5587	you are developing Bison.
	5588
	5589	@item -t
	5590	@itemx --debug
	5591	In the parser file, define the macro @code{YYDEBUG} to 1 if it is not
	5592	already defined, so that the debugging facilities are compiled.
	5593	@xref{Tracing, ,Tracing Your Parser}.
	5594
	5595	@item --locations
	5596	Pretend that @code{%locations} was specified. @xref{Decl Summary}.
	5597
	5598	@item -p @var{prefix}
	5599	@itemx --name-prefix=@var{prefix}
	5600	Pretend that @code{%name-prefix="@var{prefix}"} was specified.
	5601	@xref{Decl Summary}.
	5602
	5603	@item -l
	5604	@itemx --no-lines
	5605	Don't put any @code{#line} preprocessor commands in the parser file.
	5606	Ordinarily Bison puts them in the parser file so that the C compiler
	5607	and debuggers will associate errors with your source file, the
	5608	grammar file. This option causes them to associate errors with the
	5609	parser file, treating it as an independent source file in its own right.
	5610
	5611	@item -n
	5612	@itemx --no-parser
	5613	Pretend that @code{%no-parser} was specified. @xref{Decl Summary}.
	5614
	5615	@item -k
	5616	@itemx --token-table
	5617	Pretend that @code{%token-table} was specified. @xref{Decl Summary}.
	5618	@end table
	5619
	5620	@noindent
	5621	Adjust the output:
	5622
	5623	@table @option
	5624	@item -d
	5625	@itemx --defines
	5626	Pretend that @code{%defines} was specified, i.e., write an extra output
	5627	file containing macro definitions for the token type names defined in
	5628	the grammar and the semantic value type @code{YYSTYPE}, as well as a few
	5629	@code{extern} variable declarations. @xref{Decl Summary}.
	5630
	5631	@item --defines=@var{defines-file}
	5632	Same as above, but save in the file @var{defines-file}.
	5633
	5634	@item -b @var{file-prefix}
	5635	@itemx --file-prefix=@var{prefix}
	5636	Pretend that @code{%verbose} was specified, i.e, specify prefix to use
	5637	for all Bison output file names. @xref{Decl Summary}.
	5638
	5639	@item -r @var{things}
	5640	@itemx --report=@var{things}
	5641	Write an extra output file containing verbose description of the comma
	5642	separated list of @var{things} among:
	5643
	5644	@table @code
	5645	@item state
	5646	Description of the grammar, conflicts (resolved and unresolved), and
	5647	LALR automaton.
	5648
	5649	@item lookahead
	5650	Implies @code{state} and augments the description of the automaton with
	5651	each rule's lookahead set.
	5652
	5653	@item itemset
	5654	Implies @code{state} and augments the description of the automaton with
	5655	the full set of items for each state, instead of its core only.
	5656	@end table
	5657
	5658	For instance, on the following grammar
	5659
	5660	@item -v
	5661	@itemx --verbose
	5662	Pretend that @code{%verbose} was specified, i.e, write an extra output
	5663	file containing verbose descriptions of the grammar and
	5664	parser. @xref{Decl Summary}.
	5665
	5666	@item -o @var{filename}
	5667	@itemx --output=@var{filename}
	5668	Specify the @var{filename} for the parser file.
	5669
	5670	The other output files' names are constructed from @var{filename} as
	5671	described under the @samp{-v} and @samp{-d} options.
	5672
	5673	@item -g
	5674	Output a VCG definition of the LALR(1) grammar automaton computed by
	5675	Bison. If the grammar file is @file{foo.y}, the VCG output file will
	5676	be @file{foo.vcg}.
	5677
	5678	@item --graph=@var{graph-file}
	5679	The behaviour of @var{--graph} is the same than @samp{-g}. The only
	5680	difference is that it has an optionnal argument which is the name of
	5681	the output graph filename.
	5682	@end table
	5683
	5684	@node Environment Variables
	5685	@section Environment Variables
	5686	@cindex environment variables
	5687	@cindex BISON_SIMPLE
	5688
	5689	Here is a list of environment variables which affect the way Bison
	5690	runs.
	5691
	5692	@table @samp
	5693	@item BISON_SIMPLE
	5694	Much of the parser generated by Bison is copied verbatim from a file
	5695	called @file{bison.simple}. If Bison cannot find that file, or if you
	5696	would like to direct Bison to use a different copy, setting the
	5697	environment variable @code{BISON_SIMPLE} to the path of the file will
	5698	cause Bison to use that copy instead.
	5699	@end table
	5700
	5701	@node Option Cross Key
	5702	@section Option Cross Key
	5703
	5704	Here is a list of options, alphabetized by long option, to help you find
	5705	the corresponding short option.
	5706
	5707	@tex
	5708	\def\leaderfill{\leaders\hbox to 1em{\hss.\hss}\hfill}
	5709
	5710	{\tt
	5711	\line{ --debug \leaderfill -t}
	5712	\line{ --defines \leaderfill -d}
	5713	\line{ --file-prefix \leaderfill -b}
	5714	\line{ --graph \leaderfill -g}
	5715	\line{ --help \leaderfill -h}
	5716	\line{ --name-prefix \leaderfill -p}
	5717	\line{ --no-lines \leaderfill -l}
	5718	\line{ --no-parser \leaderfill -n}
	5719	\line{ --output \leaderfill -o}
	5720	\line{ --token-table \leaderfill -k}
	5721	\line{ --verbose \leaderfill -v}
	5722	\line{ --version \leaderfill -V}
	5723	\line{ --yacc \leaderfill -y}
	5724	}
	5725	@end tex
	5726
	5727	@ifinfo
	5728	@example
	5729	--debug -t
	5730	--defines=@var{defines-file} -d
	5731	--file-prefix=@var{prefix} -b @var{file-prefix}
	5732	--graph=@var{graph-file} -d
	5733	--help -h
	5734	--name-prefix=@var{prefix} -p @var{name-prefix}
	5735	--no-lines -l
	5736	--no-parser -n
	5737	--output=@var{outfile} -o @var{outfile}
	5738	--token-table -k
	5739	--verbose -v
	5740	--version -V
	5741	--yacc -y
	5742	@end example
	5743	@end ifinfo
	5744
	5745	@node VMS Invocation
	5746	@section Invoking Bison under VMS
	5747	@cindex invoking Bison under VMS
	5748	@cindex VMS
	5749
	5750	The command line syntax for Bison on VMS is a variant of the usual
	5751	Bison command syntax---adapted to fit VMS conventions.
	5752
	5753	To find the VMS equivalent for any Bison option, start with the long
	5754	option, and substitute a @samp{/} for the leading @samp{--}, and
	5755	substitute a @samp{_} for each @samp{-} in the name of the long option.
	5756	For example, the following invocation under VMS:
	5757
	5758	@example
	5759	bison /debug/name_prefix=bar foo.y
	5760	@end example
	5761
	5762	@noindent
	5763	is equivalent to the following command under POSIX.
	5764
	5765	@example
	5766	bison --debug --name-prefix=bar foo.y
	5767	@end example
	5768
	5769	The VMS file system does not permit filenames such as
	5770	@file{foo.tab.c}. In the above example, the output file
	5771	would instead be named @file{foo_tab.c}.
	5772
	5773	@node Table of Symbols
	5774	@appendix Bison Symbols
	5775	@cindex Bison symbols, table of
	5776	@cindex symbols in Bison, table of
	5777
	5778	@table @code
	5779	@item @@$
	5780	In an action, the location of the left-hand side of the rule.
	5781	@xref{Locations, , Locations Overview}.
	5782
	5783	@item @@@var{n}
	5784	In an action, the location of the @var{n}-th symbol of the right-hand
	5785	side of the rule. @xref{Locations, , Locations Overview}.
	5786
	5787	@item $$
	5788	In an action, the semantic value of the left-hand side of the rule.
	5789	@xref{Actions}.
	5790
	5791	@item $@var{n}
	5792	In an action, the semantic value of the @var{n}-th symbol of the
	5793	right-hand side of the rule. @xref{Actions}.
	5794
	5795	@item error
	5796	A token name reserved for error recovery. This token may be used in
	5797	grammar rules so as to allow the Bison parser to recognize an error in
	5798	the grammar without halting the process. In effect, a sentence
	5799	containing an error may be recognized as valid. On a parse error, the
	5800	token @code{error} becomes the current look-ahead token. Actions
	5801	corresponding to @code{error} are then executed, and the look-ahead
	5802	token is reset to the token that originally caused the violation.
	5803	@xref{Error Recovery}.
	5804
	5805	@item YYABORT
	5806	Macro to pretend that an unrecoverable syntax error has occurred, by
	5807	making @code{yyparse} return 1 immediately. The error reporting
	5808	function @code{yyerror} is not called. @xref{Parser Function, ,The
	5809	Parser Function @code{yyparse}}.
	5810
	5811	@item YYACCEPT
	5812	Macro to pretend that a complete utterance of the language has been
	5813	read, by making @code{yyparse} return 0 immediately.
	5814	@xref{Parser Function, ,The Parser Function @code{yyparse}}.
	5815
	5816	@item YYBACKUP
	5817	Macro to discard a value from the parser stack and fake a look-ahead
	5818	token. @xref{Action Features, ,Special Features for Use in Actions}.
	5819
	5820	@item YYDEBUG
	5821	Macro to define to equip the parser with tracing code. @xref{Tracing,
	5822	,Tracing Your Parser}.
	5823
	5824	@item YYERROR
	5825	Macro to pretend that a syntax error has just been detected: call
	5826	@code{yyerror} and then perform normal error recovery if possible
	5827	(@pxref{Error Recovery}), or (if recovery is impossible) make
	5828	@code{yyparse} return 1. @xref{Error Recovery}.
	5829
	5830	@item YYERROR_VERBOSE
	5831	Macro that you define with @code{#define} in the Bison declarations
	5832	section to request verbose, specific error message strings when
	5833	@code{yyerror} is called.
	5834
	5835	@item YYINITDEPTH
	5836	Macro for specifying the initial size of the parser stack.
	5837	@xref{Stack Overflow}.
	5838
	5839	@item YYLEX_PARAM
	5840	Macro for specifying an extra argument (or list of extra arguments) for
	5841	@code{yyparse} to pass to @code{yylex}. @xref{Pure Calling,, Calling
	5842	Conventions for Pure Parsers}.
	5843
	5844	@item YYLTYPE
	5845	Macro for the data type of @code{yylloc}; a structure with four
	5846	members. @xref{Location Type, , Data Types of Locations}.
	5847
	5848	@item yyltype
	5849	Default value for YYLTYPE.
	5850
	5851	@item YYMAXDEPTH
	5852	Macro for specifying the maximum size of the parser stack.
	5853	@xref{Stack Overflow}.
	5854
	5855	@item YYPARSE_PARAM
	5856	Macro for specifying the name of a parameter that @code{yyparse} should
	5857	accept. @xref{Pure Calling,, Calling Conventions for Pure Parsers}.
	5858
	5859	@item YYRECOVERING
	5860	Macro whose value indicates whether the parser is recovering from a
	5861	syntax error. @xref{Action Features, ,Special Features for Use in Actions}.
	5862
	5863	@item YYSTACK_USE_ALLOCA
	5864	Macro used to control the use of @code{alloca}. If defined to @samp{0},
	5865	the parser will not use @code{alloca} but @code{malloc} when trying to
	5866	grow its internal stacks. Do @emph{not} define @code{YYSTACK_USE_ALLOCA}
	5867	to anything else.
	5868
	5869	@item YYSTYPE
	5870	Macro for the data type of semantic values; @code{int} by default.
	5871	@xref{Value Type, ,Data Types of Semantic Values}.
	5872
	5873	@item yychar
	5874	External integer variable that contains the integer value of the current
	5875	look-ahead token. (In a pure parser, it is a local variable within
	5876	@code{yyparse}.) Error-recovery rule actions may examine this variable.
	5877	@xref{Action Features, ,Special Features for Use in Actions}.
	5878
	5879	@item yyclearin
	5880	Macro used in error-recovery rule actions. It clears the previous
	5881	look-ahead token. @xref{Error Recovery}.
	5882
	5883	@item yydebug
	5884	External integer variable set to zero by default. If @code{yydebug}
	5885	is given a nonzero value, the parser will output information on input
	5886	symbols and parser action. @xref{Tracing, ,Tracing Your Parser}.
	5887
	5888	@item yyerrok
	5889	Macro to cause parser to recover immediately to its normal mode
	5890	after a parse error. @xref{Error Recovery}.
	5891
	5892	@item yyerror
	5893	User-supplied function to be called by @code{yyparse} on error. The
	5894	function receives one argument, a pointer to a character string
	5895	containing an error message. @xref{Error Reporting, ,The Error
	5896	Reporting Function @code{yyerror}}.
	5897
	5898	@item yylex
	5899	User-supplied lexical analyzer function, called with no arguments to get
	5900	the next token. @xref{Lexical, ,The Lexical Analyzer Function
	5901	@code{yylex}}.
	5902
	5903	@item yylval
	5904	External variable in which @code{yylex} should place the semantic
	5905	value associated with a token. (In a pure parser, it is a local
	5906	variable within @code{yyparse}, and its address is passed to
	5907	@code{yylex}.) @xref{Token Values, ,Semantic Values of Tokens}.
	5908
	5909	@item yylloc
	5910	External variable in which @code{yylex} should place the line and column
	5911	numbers associated with a token. (In a pure parser, it is a local
	5912	variable within @code{yyparse}, and its address is passed to
	5913	@code{yylex}.) You can ignore this variable if you don't use the
	5914	@samp{@@} feature in the grammar actions. @xref{Token Positions,
	5915	,Textual Positions of Tokens}.
	5916
	5917	@item yynerrs
	5918	Global variable which Bison increments each time there is a parse error.
	5919	(In a pure parser, it is a local variable within @code{yyparse}.)
	5920	@xref{Error Reporting, ,The Error Reporting Function @code{yyerror}}.
	5921
	5922	@item yyparse
	5923	The parser function produced by Bison; call this function to start
	5924	parsing. @xref{Parser Function, ,The Parser Function @code{yyparse}}.
	5925
	5926	@item %debug
	5927	Equip the parser for debugging. @xref{Decl Summary}.
	5928
	5929	@item %defines
	5930	Bison declaration to create a header file meant for the scanner.
	5931	@xref{Decl Summary}.
	5932
	5933	@item %file-prefix="@var{prefix}"
	5934	Bison declaration to set tge prefix of the output files. @xref{Decl
	5935	Summary}.
	5936
	5937	@c @item %source-extension
	5938	@c Bison declaration to specify the generated parser output file extension.
	5939	@c @xref{Decl Summary}.
	5940	@c
	5941	@c @item %header-extension
	5942	@c Bison declaration to specify the generated parser header file extension
	5943	@c if required. @xref{Decl Summary}.
	5944
	5945	@item %left
	5946	Bison declaration to assign left associativity to token(s).
	5947	@xref{Precedence Decl, ,Operator Precedence}.
	5948
	5949	@item %name-prefix="@var{prefix}"
	5950	Bison declaration to rename the external symbols. @xref{Decl Summary}.
	5951
	5952	@item %no-lines
	5953	Bison declaration to avoid generating @code{#line} directives in the
	5954	parser file. @xref{Decl Summary}.
	5955
	5956	@item %nonassoc
	5957	Bison declaration to assign non-associativity to token(s).
	5958	@xref{Precedence Decl, ,Operator Precedence}.
	5959
	5960	@item %output="@var{filename}"
	5961	Bison declaration to set the name of the parser file. @xref{Decl
	5962	Summary}.
	5963
	5964	@item %prec
	5965	Bison declaration to assign a precedence to a specific rule.
	5966	@xref{Contextual Precedence, ,Context-Dependent Precedence}.
	5967
	5968	@item %pure-parser
	5969	Bison declaration to request a pure (reentrant) parser.
	5970	@xref{Pure Decl, ,A Pure (Reentrant) Parser}.
	5971
	5972	@item %right
	5973	Bison declaration to assign right associativity to token(s).
	5974	@xref{Precedence Decl, ,Operator Precedence}.
	5975
	5976	@item %start
	5977	Bison declaration to specify the start symbol. @xref{Start Decl, ,The
	5978	Start-Symbol}.
	5979
	5980	@item %token
	5981	Bison declaration to declare token(s) without specifying precedence.
	5982	@xref{Token Decl, ,Token Type Names}.
	5983
	5984	@item %token-table
	5985	Bison declaration to include a token name table in the parser file.
	5986	@xref{Decl Summary}.
	5987
	5988	@item %type
	5989	Bison declaration to declare nonterminals. @xref{Type Decl,
	5990	,Nonterminal Symbols}.
	5991
	5992	@item %union
	5993	Bison declaration to specify several possible data types for semantic
	5994	values. @xref{Union Decl, ,The Collection of Value Types}.
	5995	@end table
	5996
	5997	@sp 1
	5998
	5999	These are the punctuation and delimiters used in Bison input:
	6000
	6001	@table @samp
	6002	@item %%
	6003	Delimiter used to separate the grammar rule section from the
	6004	Bison declarations section or the epilogue.
	6005	@xref{Grammar Layout, ,The Overall Layout of a Bison Grammar}.
	6006
	6007	@item %@{ %@}
	6008	All code listed between @samp{%@{} and @samp{%@}} is copied directly to
	6009	the output file uninterpreted. Such code forms the prologue of the input
	6010	file. @xref{Grammar Outline, ,Outline of a Bison
	6011	Grammar}.
	6012
	6013	@item /@dots{}/
	6014	Comment delimiters, as in C.
	6015
	6016	@item :
	6017	Separates a rule's result from its components. @xref{Rules, ,Syntax of
	6018	Grammar Rules}.
	6019
	6020	@item ;
	6021	Terminates a rule. @xref{Rules, ,Syntax of Grammar Rules}.
	6022
	6023	@item \|
	6024	Separates alternate rules for the same result nonterminal.
	6025	@xref{Rules, ,Syntax of Grammar Rules}.
	6026	@end table
	6027
	6028	@node Glossary
	6029	@appendix Glossary
	6030	@cindex glossary
	6031
	6032	@table @asis
	6033	@item Backus-Naur Form (BNF)
	6034	Formal method of specifying context-free grammars. BNF was first used
	6035	in the @cite{ALGOL-60} report, 1963. @xref{Language and Grammar,
	6036	,Languages and Context-Free Grammars}.
	6037
	6038	@item Context-free grammars
	6039	Grammars specified as rules that can be applied regardless of context.
	6040	Thus, if there is a rule which says that an integer can be used as an
	6041	expression, integers are allowed @emph{anywhere} an expression is
	6042	permitted. @xref{Language and Grammar, ,Languages and Context-Free
	6043	Grammars}.
	6044
	6045	@item Dynamic allocation
	6046	Allocation of memory that occurs during execution, rather than at
	6047	compile time or on entry to a function.
	6048
	6049	@item Empty string
	6050	Analogous to the empty set in set theory, the empty string is a
	6051	character string of length zero.
	6052
	6053	@item Finite-state stack machine
	6054	A ``machine'' that has discrete states in which it is said to exist at
	6055	each instant in time. As input to the machine is processed, the
	6056	machine moves from state to state as specified by the logic of the
	6057	machine. In the case of the parser, the input is the language being
	6058	parsed, and the states correspond to various stages in the grammar
	6059	rules. @xref{Algorithm, ,The Bison Parser Algorithm }.
	6060
	6061	@item Grouping
	6062	A language construct that is (in general) grammatically divisible;
	6063	for example, `expression' or `declaration' in C.
	6064	@xref{Language and Grammar, ,Languages and Context-Free Grammars}.
	6065
	6066	@item Infix operator
	6067	An arithmetic operator that is placed between the operands on which it
	6068	performs some operation.
	6069
	6070	@item Input stream
	6071	A continuous flow of data between devices or programs.
	6072
	6073	@item Language construct
	6074	One of the typical usage schemas of the language. For example, one of
	6075	the constructs of the C language is the @code{if} statement.
	6076	@xref{Language and Grammar, ,Languages and Context-Free Grammars}.
	6077
	6078	@item Left associativity
	6079	Operators having left associativity are analyzed from left to right:
	6080	@samp{a+b+c} first computes @samp{a+b} and then combines with
	6081	@samp{c}. @xref{Precedence, ,Operator Precedence}.
	6082
	6083	@item Left recursion
	6084	A rule whose result symbol is also its first component symbol; for
	6085	example, @samp{expseq1 : expseq1 ',' exp;}. @xref{Recursion, ,Recursive
	6086	Rules}.
	6087
	6088	@item Left-to-right parsing
	6089	Parsing a sentence of a language by analyzing it token by token from
	6090	left to right. @xref{Algorithm, ,The Bison Parser Algorithm }.
	6091
	6092	@item Lexical analyzer (scanner)
	6093	A function that reads an input stream and returns tokens one by one.
	6094	@xref{Lexical, ,The Lexical Analyzer Function @code{yylex}}.
	6095
	6096	@item Lexical tie-in
	6097	A flag, set by actions in the grammar rules, which alters the way
	6098	tokens are parsed. @xref{Lexical Tie-ins}.
	6099
	6100	@item Literal string token
	6101	A token which consists of two or more fixed characters. @xref{Symbols}.
	6102
	6103	@item Look-ahead token
	6104	A token already read but not yet shifted. @xref{Look-Ahead, ,Look-Ahead
	6105	Tokens}.
	6106
	6107	@item LALR(1)
	6108	The class of context-free grammars that Bison (like most other parser
	6109	generators) can handle; a subset of LR(1). @xref{Mystery Conflicts, ,
	6110	Mysterious Reduce/Reduce Conflicts}.
	6111
	6112	@item LR(1)
	6113	The class of context-free grammars in which at most one token of
	6114	look-ahead is needed to disambiguate the parsing of any piece of input.
	6115
	6116	@item Nonterminal symbol
	6117	A grammar symbol standing for a grammatical construct that can
	6118	be expressed through rules in terms of smaller constructs; in other
	6119	words, a construct that is not a token. @xref{Symbols}.
	6120
	6121	@item Parse error
	6122	An error encountered during parsing of an input stream due to invalid
	6123	syntax. @xref{Error Recovery}.
	6124
	6125	@item Parser
	6126	A function that recognizes valid sentences of a language by analyzing
	6127	the syntax structure of a set of tokens passed to it from a lexical
	6128	analyzer.
	6129
	6130	@item Postfix operator
	6131	An arithmetic operator that is placed after the operands upon which it
	6132	performs some operation.
	6133
	6134	@item Reduction
	6135	Replacing a string of nonterminals and/or terminals with a single
	6136	nonterminal, according to a grammar rule. @xref{Algorithm, ,The Bison
	6137	Parser Algorithm }.
	6138
	6139	@item Reentrant
	6140	A reentrant subprogram is a subprogram which can be in invoked any
	6141	number of times in parallel, without interference between the various
	6142	invocations. @xref{Pure Decl, ,A Pure (Reentrant) Parser}.
	6143
	6144	@item Reverse polish notation
	6145	A language in which all operators are postfix operators.
	6146
	6147	@item Right recursion
	6148	A rule whose result symbol is also its last component symbol; for
	6149	example, @samp{expseq1: exp ',' expseq1;}. @xref{Recursion, ,Recursive
	6150	Rules}.
	6151
	6152	@item Semantics
	6153	In computer languages, the semantics are specified by the actions
	6154	taken for each instance of the language, i.e., the meaning of
	6155	each statement. @xref{Semantics, ,Defining Language Semantics}.
	6156
	6157	@item Shift
	6158	A parser is said to shift when it makes the choice of analyzing
	6159	further input from the stream rather than reducing immediately some
	6160	already-recognized rule. @xref{Algorithm, ,The Bison Parser Algorithm }.
	6161
	6162	@item Single-character literal
	6163	A single character that is recognized and interpreted as is.
	6164	@xref{Grammar in Bison, ,From Formal Rules to Bison Input}.
	6165
	6166	@item Start symbol
	6167	The nonterminal symbol that stands for a complete valid utterance in
	6168	the language being parsed. The start symbol is usually listed as the
	6169	first nonterminal symbol in a language specification.
	6170	@xref{Start Decl, ,The Start-Symbol}.
	6171
	6172	@item Symbol table
	6173	A data structure where symbol names and associated data are stored
	6174	during parsing to allow for recognition and use of existing
	6175	information in repeated uses of a symbol. @xref{Multi-function Calc}.
	6176
	6177	@item Token
	6178	A basic, grammatically indivisible unit of a language. The symbol
	6179	that describes a token in the grammar is a terminal symbol.
	6180	The input of the Bison parser is a stream of tokens which comes from
	6181	the lexical analyzer. @xref{Symbols}.
	6182
	6183	@item Terminal symbol
	6184	A grammar symbol that has no rules in the grammar and therefore is
	6185	grammatically indivisible. The piece of text it represents is a token.
	6186	@xref{Language and Grammar, ,Languages and Context-Free Grammars}.
	6187	@end table
	6188
	6189	@node Copying This Manual
	6190	@appendix Copying This Manual
	6191
	6192	@menu
	6193	* GNU Free Documentation License:: License for copying this manual.
	6194	@end menu
	6195
	6196	@include fdl.texi
	6197
	6198	@node Index
	6199	@unnumbered Index
	6200
	6201	@printindex cp
	6202
	6203	@bye