git.saurik.com Git - bison.git/blame_incremental

... / ...

Commit	Line	Data
	1	-- outline --
	2
	3	* Various
	4	** YYPRINT
	5	glr.c inherits its symbol_print function from c.m4, which supports
	6	YYPRINT. But to use YYPRINT yytoknum is needed, which not defined by
	7	glr.c.
	8
	9	Anyway, IMHO YYPRINT is obsolete and should be restricted to yacc.c.
	10
	11	** YYERRCODE
	12	Defined to 256, but not used, not documented. Probably the token
	13	number for the error token, which POSIX wants to be 256, but which
	14	Bison might renumber if the user used number 256. Keep fix and doc?
	15	Throw away?
	16
	17	We could (should?) also treat the case of the undef_token, which is
	18	numbered 257 for yylex, and 2 internal. Both appear for instance in
	19	toknum:
	20
	21	const unsigned short int
	22	parser::yytoken_number_[] =
	23	{
	24	0, 256, 257, 258, 259, 260, 261, 262, 263, 264,
	25
	26	while here
	27
	28	enum yytokentype {
	29	TOK_EOF = 0,
	30	TOK_EQ = 258,
	31
	32	so both 256 and 257 are "mysterious".
	33
	34	const char*
	35	const parser::yytname_[] =
	36	{
	37	"\"end of command\"", "error", "$undefined", "\"=\"", "\"break\"",
	38
	39
	40	** YYFAIL
	41	It is seems to be really obsolete now, shall we remove it?
	42
	43	** YYBACKUP
	44	There is no test about it, no examples in the doc, and I'm not sure
	45	what it should look like. For instance what follows crashes.
	46
	47	%error-verbose
	48	%debug
	49	%pure-parser
	50	%code {
	51	# include <stdio.h>
	52	# include <stdlib.h>
	53	# include <assert.h>
	54
	55	static void yyerror (const char *msg);
	56	static int yylex (YYSTYPE *yylval);
	57	}
	58	%%
	59	exp:
	60	'a' { printf ("a: %d\n", $1); }
	61	\| 'b' { YYBACKUP('a', 123); }
	62	;
	63	%%
	64	static int
	65	yylex (YYSTYPE *yylval)
	66	{
	67	static char const input[] = "b";
	68	static size_t toknum;
	69	assert (toknum < sizeof input);
	70	yylval = (toknum + 1) 10;
	71	return input[toknum++];
	72	}
	73
	74	static void
	75	yyerror (const char *msg)
	76	{
	77	fprintf (stderr, "%s\n", msg);
	78	}
	79
	80	int
	81	main (void)
	82	{
	83	yydebug = !!getenv("YYDEBUG");
	84	return yyparse ();
	85	}
	86
	87	** yychar == yyempty_
	88	The code in yyerrlab reads:
	89
	90	if (yychar <= YYEOF)
	91	{
	92	/* Return failure if at end of input. */
	93	if (yychar == YYEOF)
	94	YYABORT;
	95	}
	96
	97	There are only two yychar that can be <= YYEOF: YYEMPTY and YYEOF.
	98	But I can't produce the situation where yychar is YYEMPTY here, is it
	99	really possible? The test suite does not exercise this case.
	100
	101	This shows that it would be interesting to manage to install skeleton
	102	coverage analysis to the test suite.
	103
	104	** Table definitions
	105	It should be very easy to factor the definition of the various tables,
	106	including the separation bw declaration and definition. See for
	107	instance b4_table_define in lalr1.cc. This way, we could even factor
	108	C vs. C++ definitions.
	109
	110	* From lalr1.cc to yacc.c
	111	** Single stack
	112	Merging the three stacks in lalr1.cc simplified the code, prompted for
	113	other improvements and also made it faster (probably because memory
	114	management is performed once instead of three times). I suggest that
	115	we do the same in yacc.c.
	116
	117	** yysyntax_error
	118	In lalr1.cc we invoke it with the translated lookahead (yytoken), and
	119	yacc.c uses yychar. I don't see why.
	120
	121	** yysyntax_error
	122	The use of switch to select yyfmt in lalr1.cc seems simpler than
	123	what's done in yacc.c.
	124
	125	* Header guards
	126
	127	From Franc,ois: should we keep the directory part in the CPP guard?
	128
	129
	130	* Yacc.c: CPP Macros
	131
	132	Do some people use YYPURE, YYLSP_NEEDED like we do in the test suite?
	133	They should not: it is not documented. But if they need to, let's
	134	find something clean (not like YYLSP_NEEDED...).
	135
	136
	137	* Installation
	138
	139	* Documentation
	140	Before releasing, make sure the documentation ("Understanding your
	141	parser") refers to the current `output' format.
	142
	143	* Report
	144
	145	** GLR
	146	How would Paul like to display the conflicted actions? In particular,
	147	what when two reductions are possible on a given lookahead token, but one is
	148	part of $default. Should we make the two reductions explicit, or just
	149	keep $default? See the following point.
	150
	151	** Disabled Reductions
	152	See `tests/conflicts.at (Defaulted Conflicted Reduction)', and decide
	153	what we want to do.
	154
	155	** Documentation
	156	Extend with error productions. The hard part will probably be finding
	157	the right rule so that a single state does not exhibit too many yet
	158	undocumented ``features''. Maybe an empty action ought to be
	159	presented too. Shall we try to make a single grammar with all these
	160	features, or should we have several very small grammars?
	161
	162	** --report=conflict-path
	163	Provide better assistance for understanding the conflicts by providing
	164	a sample text exhibiting the (LALR) ambiguity. See the paper from
	165	DeRemer and Penello: they already provide the algorithm.
	166
	167	** Statically check for potential ambiguities in GLR grammars. See
	168	<http://www.i3s.unice.fr/~schmitz/papers.html#expamb> for an approach.
	169
	170
	171	* Extensions
	172
	173	** Labeling the symbols
	174	Have a look at the Lemon parser generator: instead of $1, $2 etc. they
	175	can name the values. This is much more pleasant. For instance:
	176
	177	exp (res): exp (a) '+' exp (b) { $res = $a + $b; };
	178
	179	I love this. I have been bitten too often by the removal of the
	180	symbol, and forgetting to shift all the $n to $n-1. If you are
	181	unlucky, it compiles...
	182
	183	But instead of using $a etc., we can use regular variables. And
	184	instead of using (), I propose to use `:' (again). Paul suggests
	185	supporting `->' in addition to `:' to separate LHS and RHS. In other
	186	words:
	187
	188	r:exp -> a:exp '+' b:exp { r = a + b; };
	189
	190	That requires an significant improvement of the grammar parser. Using
	191	GLR would be nice. It also requires that Bison know the type of the
	192	symbols (which will be useful for %include anyway). So we have some
	193	time before...
	194
	195	Note that there remains the problem of locations: `@r'?
	196
	197
	198	** $-1
	199	We should find a means to provide an access to values deep in the
	200	stack. For instance, instead of
	201
	202	baz: qux { $$ = $<foo>-1 + $<bar>0 + $1; }
	203
	204	we should be able to have:
	205
	206	foo($foo) bar($bar) baz($bar): qux($qux) { $baz = $foo + $bar + $qux; }
	207
	208	Or something like this.
	209
	210	** %if and the like
	211	It should be possible to have %if/%else/%endif. The implementation is
	212	not clear: should it be lexical or syntactic. Vadim Maslow thinks it
	213	must be in the scanner: we must not parse what is in a switched off
	214	part of %if. Akim Demaille thinks it should be in the parser, so as
	215	to avoid falling into another CPP mistake.
	216
	217	** XML Output
	218	There are couple of available extensions of Bison targeting some XML
	219	output. Some day we should consider including them. One issue is
	220	that they seem to be quite orthogonal to the parsing technique, and
	221	seem to depend mostly on the possibility to have some code triggered
	222	for each reduction. As a matter of fact, such hooks could also be
	223	used to generate the yydebug traces. Some generic scheme probably
	224	exists in there.
	225
	226	XML output for GNU Bison and gcc
	227	http://www.cs.may.ie/~jpower/Research/bisonXML/
	228
	229	XML output for GNU Bison
	230	http://yaxx.sourceforge.net/
	231
	232	* Unit rules
	233	Maybe we could expand unit rules, i.e., transform
	234
	235	exp: arith \| bool;
	236	arith: exp '+' exp;
	237	bool: exp '&' exp;
	238
	239	into
	240
	241	exp: exp '+' exp \| exp '&' exp;
	242
	243	when there are no actions. This can significantly speed up some
	244	grammars. I can't find the papers. In particular the book `LR
	245	parsing: Theory and Practice' is impossible to find, but according to
	246	`Parsing Techniques: a Practical Guide', it includes information about
	247	this issue. Does anybody have it?
	248
	249
	250
	251	* Documentation
	252
	253	** History/Bibliography
	254	Some history of Bison and some bibliography would be most welcome.
	255	Are there any Texinfo standards for bibliography?
	256
	257
	258
	259	* Java, Fortran, etc.
	260
	261
	262	* Coding system independence
	263	Paul notes:
	264
	265	Currently Bison assumes 8-bit bytes (i.e. that UCHAR_MAX is
	266	255). It also assumes that the 8-bit character encoding is
	267	the same for the invocation of 'bison' as it is for the
	268	invocation of 'cc', but this is not necessarily true when
	269	people run bison on an ASCII host and then use cc on an EBCDIC
	270	host. I don't think these topics are worth our time
	271	addressing (unless we find a gung-ho volunteer for EBCDIC or
	272	PDP-10 ports :-) but they should probably be documented
	273	somewhere.
	274
	275	More importantly, Bison does not currently allow NUL bytes in
	276	tokens, either via escapes (e.g., "x\0y") or via a NUL byte in
	277	the source code. This should get fixed.
	278
	279	* --graph
	280	Show reductions.
	281
	282	* Broken options ?
	283	** %token-table
	284	** Skeleton strategy
	285	Must we keep %token-table?
	286
	287	* BTYacc
	288	See if we can integrate backtracking in Bison. Charles-Henri de
	289	Boysson <de-boy_c@epita.fr> has been working on this, but never gave
	290	the results.
	291
	292	Vadim Maslow, the maintainer of BTYacc was once contacted. Adjusting
	293	the Bison grammar parser will be needed to support some extra BTYacc
	294	features. This is less urgent.
	295
	296	** Keeping the conflicted actions
	297	First, analyze the differences between byacc and btyacc (I'm referring
	298	to the executables). Find where the conflicts are preserved.
	299
	300	** Compare with the GLR tables
	301	See how isomorphic the way BTYacc and the way the GLR adjustments in
	302	Bison are compatible. As much as possible one should try to use the
	303	same implementation in the Bison executables. I insist: it should be
	304	very feasible to use the very same conflict tables.
	305
	306	** Adjust the skeletons
	307	Import the skeletons for C and C++.
	308
	309
	310	* Precedence
	311
	312	** Partial order
	313	It is unfortunate that there is a total order for precedence. It
	314	makes it impossible to have modular precedence information. We should
	315	move to partial orders (sounds like series/parallel orders to me).
	316
	317	** RR conflicts
	318	See if we can use precedence between rules to solve RR conflicts. See
	319	what POSIX says.
	320
	321
	322	* $undefined
	323	From Hans:
	324	- If the Bison generated parser experiences an undefined number in the
	325	character range, that character is written out in diagnostic messages, an
	326	addition to the $undefined value.
	327
	328	Suggest: Change the name $undefined to undefined; looks better in outputs.
	329
	330
	331	* Default Action
	332	From Hans:
	333	- For use with my C++ parser, I transported the "switch (yyn)" statement
	334	that Bison writes to the bison.simple skeleton file. This way, I can remove
	335	the current default rule $$ = $1 implementation, which causes a double
	336	assignment to $$ which may not be OK under C++, replacing it with a
	337	"default:" part within the switch statement.
	338
	339	Note that the default rule $$ = $1, when typed, is perfectly OK under C,
	340	but in the C++ implementation I made, this rule is different from
	341	$<type_name>$ = $<type_name>1. I therefore think that one should implement
	342	a Bison option where every typed default rule is explicitly written out
	343	(same typed ruled can of course be grouped together).
	344
	345	* Pre and post actions.
	346	From: Florian Krohm <florian@edamail.fishkill.ibm.com>
	347	Subject: YYACT_EPILOGUE
	348	To: bug-bison@gnu.org
	349	X-Sent: 1 week, 4 days, 14 hours, 38 minutes, 11 seconds ago
	350
	351	The other day I had the need for explicitly building the parse tree. I
	352	used %locations for that and defined YYLLOC_DEFAULT to call a function
	353	that returns the tree node for the production. Easy. But I also needed
	354	to assign the S-attribute to the tree node. That cannot be done in
	355	YYLLOC_DEFAULT, because it is invoked before the action is executed.
	356	The way I solved this was to define a macro YYACT_EPILOGUE that would
	357	be invoked after the action. For reasons of symmetry I also added
	358	YYACT_PROLOGUE. Although I had no use for that I can envision how it
	359	might come in handy for debugging purposes.
	360	All is needed is to add
	361
	362	#if YYLSP_NEEDED
	363	YYACT_EPILOGUE (yyval, (yyvsp - yylen), yylen, yyloc, (yylsp - yylen));
	364	#else
	365	YYACT_EPILOGUE (yyval, (yyvsp - yylen), yylen);
	366	#endif
	367
	368	at the proper place to bison.simple. Ditto for YYACT_PROLOGUE.
	369
	370	I was wondering what you think about adding YYACT_PROLOGUE/EPILOGUE
	371	to bison. If you're interested, I'll work on a patch.
	372
	373	* Better graphics
	374	Equip the parser with a means to create the (visual) parse tree.
	375
	376	-----
	377
	378	Copyright (C) 2001, 2002, 2003, 2004, 2006, 2008 Free Software Foundation,
	379	Inc.
	380
	381	This file is part of Bison, the GNU Compiler Compiler.
	382
	383	This program is free software: you can redistribute it and/or modify
	384	it under the terms of the GNU General Public License as published by
	385	the Free Software Foundation, either version 3 of the License, or
	386	(at your option) any later version.
	387
	388	This program is distributed in the hope that it will be useful,
	389	but WITHOUT ANY WARRANTY; without even the implied warranty of
	390	MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
	391	GNU General Public License for more details.
	392
	393	You should have received a copy of the GNU General Public License
	394	along with this program. If not, see <http://www.gnu.org/licenses/>.