git.saurik.com Git - bison.git/blame_incremental

... / ...

Commit	Line	Data
	1	* Short term
	2	** scan-code.l
	3	Avoid variables for format strings, as then GCC cannot check them.
	4
	5	** m4 names
	6	b4_shared_declarations is no longer what it is. Make it
	7	b4_parser_declaration for instance.
	8
	9	** glr.cc: %defines
	10	it should not be mandatory.
	11
	12	** $ and others in epilogue
	13	A stray $ is a warning in the actions, but an error in the epilogue.
	14	IMHO, it should not even be a warning in the epilogue.
	15
	16	** obstack_copy etc.
	17	There seems to be some other interesting functions for obstacks that
	18	we should consider using.
	19
	20	** stack.hh
	21	Get rid of it. The original idea is nice, but actually it makes
	22	the code harder to follow, and uselessly different from the other
	23	skeletons.
	24
	25	** Variable names.
	26	What should we name `variant' and `lex_symbol'?
	27
	28	** Get rid of fake #lines [Bison: ...]
	29	Possibly as simple as checking whether the column number is nonnegative.
	30
	31	I have seen messages like the following from GCC.
	32
	33	<built-in>:0: fatal error: opening dependency file .deps/libltdl/argz.Tpo: No such file or directory
	34
	35
	36	** Discuss about %printer/%destroy in the case of C++.
	37	It would be very nice to provide the symbol classes with an operator<<
	38	and a destructor. Unfortunately the syntax we have chosen for
	39	%destroy and %printer make them hard to reuse. For instance, the user
	40	is invited to write something like
	41
	42	%printer { debug_stream() << $$; } <my_type>;
	43
	44	which is hard to reuse elsewhere since it wants to use
	45	"debug_stream()" to find the stream to use. The same applies to
	46	%destroy: we told the user she could use the members of the Parser
	47	class in the printers/destructors, which is not good for an operator<<
	48	since it is no longer bound to a particular parser, it's just a
	49	(standalone symbol).
	50
	51	** Rename LR0.cc
	52	as lr0.cc, why upper case?
	53
	54	** bench several bisons.
	55	Enhance bench.pl with %b to run different bisons.
	56
	57	* Various
	58	** YYERRCODE
	59	Defined to 256, but not used, not documented. Probably the token
	60	number for the error token, which POSIX wants to be 256, but which
	61	Bison might renumber if the user used number 256. Keep fix and doc?
	62	Throw away?
	63
	64	Also, why don't we output the token name of the error token in the
	65	output? It is explicitly skipped:
	66
	67	/* Skip error token and tokens without identifier. */
	68	if (sym != errtoken && id)
	69
	70	Of course there are issues with name spaces, but if we disable we have
	71	something which seems to be more simpler and more consistent instead
	72	of the special case YYERRCODE.
	73
	74	enum yytokentype {
	75	error = 256,
	76	// ...
	77	};
	78
	79
	80	We could (should?) also treat the case of the undef_token, which is
	81	numbered 257 for yylex, and 2 internal. Both appear for instance in
	82	toknum:
	83
	84	const unsigned short int
	85	parser::yytoken_number_[] =
	86	{
	87	0, 256, 257, 258, 259, 260, 261, 262, 263, 264,
	88
	89	while here
	90
	91	enum yytokentype {
	92	TOK_EOF = 0,
	93	TOK_EQ = 258,
	94
	95	so both 256 and 257 are "mysterious".
	96
	97	const char*
	98	const parser::yytname_[] =
	99	{
	100	"\"end of command\"", "error", "$undefined", "\"=\"", "\"break\"",
	101
	102
	103	** yychar == yyempty_
	104	The code in yyerrlab reads:
	105
	106	if (yychar <= YYEOF)
	107	{
	108	/* Return failure if at end of input. */
	109	if (yychar == YYEOF)
	110	YYABORT;
	111	}
	112
	113	There are only two yychar that can be <= YYEOF: YYEMPTY and YYEOF.
	114	But I can't produce the situation where yychar is YYEMPTY here, is it
	115	really possible? The test suite does not exercise this case.
	116
	117	This shows that it would be interesting to manage to install skeleton
	118	coverage analysis to the test suite.
	119
	120	** Table definitions
	121	It should be very easy to factor the definition of the various tables,
	122	including the separation bw declaration and definition. See for
	123	instance b4_table_define in lalr1.cc. This way, we could even factor
	124	C vs. C++ definitions.
	125
	126	* From lalr1.cc to yacc.c
	127	** Single stack
	128	Merging the three stacks in lalr1.cc simplified the code, prompted for
	129	other improvements and also made it faster (probably because memory
	130	management is performed once instead of three times). I suggest that
	131	we do the same in yacc.c.
	132
	133	** yysyntax_error
	134	The code bw glr.c and yacc.c is really alike, we can certainly factor
	135	some parts.
	136
	137
	138	* Report
	139
	140	** Figures
	141	Some statistics about the grammar and the parser would be useful,
	142	especially when asking the user to send some information about the
	143	grammars she is working on. We should probably also include some
	144	information about the variables (I'm not sure for instance we even
	145	specify what LR variant was used).
	146
	147	** GLR
	148	How would Paul like to display the conflicted actions? In particular,
	149	what when two reductions are possible on a given lookahead token, but one is
	150	part of $default. Should we make the two reductions explicit, or just
	151	keep $default? See the following point.
	152
	153	** Disabled Reductions
	154	See `tests/conflicts.at (Defaulted Conflicted Reduction)', and decide
	155	what we want to do.
	156
	157	** Documentation
	158	Extend with error productions. The hard part will probably be finding
	159	the right rule so that a single state does not exhibit too many yet
	160	undocumented ``features''. Maybe an empty action ought to be
	161	presented too. Shall we try to make a single grammar with all these
	162	features, or should we have several very small grammars?
	163
	164	** --report=conflict-path
	165	Provide better assistance for understanding the conflicts by providing
	166	a sample text exhibiting the (LALR) ambiguity. See the paper from
	167	DeRemer and Penello: they already provide the algorithm.
	168
	169	** Statically check for potential ambiguities in GLR grammars. See
	170	<http://www.i3s.unice.fr/~schmitz/papers.html#expamb> for an approach.
	171
	172
	173	* Extensions
	174
	175	** $-1
	176	We should find a means to provide an access to values deep in the
	177	stack. For instance, instead of
	178
	179	baz: qux { $$ = $<foo>-1 + $<bar>0 + $1; }
	180
	181	we should be able to have:
	182
	183	foo($foo) bar($bar) baz($bar): qux($qux) { $baz = $foo + $bar + $qux; }
	184
	185	Or something like this.
	186
	187	** %if and the like
	188	It should be possible to have %if/%else/%endif. The implementation is
	189	not clear: should it be lexical or syntactic. Vadim Maslow thinks it
	190	must be in the scanner: we must not parse what is in a switched off
	191	part of %if. Akim Demaille thinks it should be in the parser, so as
	192	to avoid falling into another CPP mistake.
	193
	194	** XML Output
	195	There are couple of available extensions of Bison targeting some XML
	196	output. Some day we should consider including them. One issue is
	197	that they seem to be quite orthogonal to the parsing technique, and
	198	seem to depend mostly on the possibility to have some code triggered
	199	for each reduction. As a matter of fact, such hooks could also be
	200	used to generate the yydebug traces. Some generic scheme probably
	201	exists in there.
	202
	203	XML output for GNU Bison and gcc
	204	http://www.cs.may.ie/~jpower/Research/bisonXML/
	205
	206	XML output for GNU Bison
	207	http://yaxx.sourceforge.net/
	208
	209	* Unit rules
	210	Maybe we could expand unit rules, i.e., transform
	211
	212	exp: arith \| bool;
	213	arith: exp '+' exp;
	214	bool: exp '&' exp;
	215
	216	into
	217
	218	exp: exp '+' exp \| exp '&' exp;
	219
	220	when there are no actions. This can significantly speed up some
	221	grammars. I can't find the papers. In particular the book `LR
	222	parsing: Theory and Practice' is impossible to find, but according to
	223	`Parsing Techniques: a Practical Guide', it includes information about
	224	this issue. Does anybody have it?
	225
	226
	227
	228	* Documentation
	229
	230	** History/Bibliography
	231	Some history of Bison and some bibliography would be most welcome.
	232	Are there any Texinfo standards for bibliography?
	233
	234	* Coding system independence
	235	Paul notes:
	236
	237	Currently Bison assumes 8-bit bytes (i.e. that UCHAR_MAX is
	238	255). It also assumes that the 8-bit character encoding is
	239	the same for the invocation of 'bison' as it is for the
	240	invocation of 'cc', but this is not necessarily true when
	241	people run bison on an ASCII host and then use cc on an EBCDIC
	242	host. I don't think these topics are worth our time
	243	addressing (unless we find a gung-ho volunteer for EBCDIC or
	244	PDP-10 ports :-) but they should probably be documented
	245	somewhere.
	246
	247	More importantly, Bison does not currently allow NUL bytes in
	248	tokens, either via escapes (e.g., "x\0y") or via a NUL byte in
	249	the source code. This should get fixed.
	250
	251	* --graph
	252	Show reductions.
	253
	254	* Broken options ?
	255	** %token-table
	256	** Skeleton strategy
	257	Must we keep %token-table?
	258
	259	* Precedence
	260
	261	** Partial order
	262	It is unfortunate that there is a total order for precedence. It
	263	makes it impossible to have modular precedence information. We should
	264	move to partial orders (sounds like series/parallel orders to me).
	265
	266	** RR conflicts
	267	See if we can use precedence between rules to solve RR conflicts. See
	268	what POSIX says.
	269
	270
	271	* $undefined
	272	From Hans:
	273	- If the Bison generated parser experiences an undefined number in the
	274	character range, that character is written out in diagnostic messages, an
	275	addition to the $undefined value.
	276
	277	Suggest: Change the name $undefined to undefined; looks better in outputs.
	278
	279
	280	* Default Action
	281	From Hans:
	282	- For use with my C++ parser, I transported the "switch (yyn)" statement
	283	that Bison writes to the bison.simple skeleton file. This way, I can remove
	284	the current default rule $$ = $1 implementation, which causes a double
	285	assignment to $$ which may not be OK under C++, replacing it with a
	286	"default:" part within the switch statement.
	287
	288	Note that the default rule $$ = $1, when typed, is perfectly OK under C,
	289	but in the C++ implementation I made, this rule is different from
	290	$<type_name>$ = $<type_name>1. I therefore think that one should implement
	291	a Bison option where every typed default rule is explicitly written out
	292	(same typed ruled can of course be grouped together).
	293
	294	* Pre and post actions.
	295	From: Florian Krohm <florian@edamail.fishkill.ibm.com>
	296	Subject: YYACT_EPILOGUE
	297	To: bug-bison@gnu.org
	298	X-Sent: 1 week, 4 days, 14 hours, 38 minutes, 11 seconds ago
	299
	300	The other day I had the need for explicitly building the parse tree. I
	301	used %locations for that and defined YYLLOC_DEFAULT to call a function
	302	that returns the tree node for the production. Easy. But I also needed
	303	to assign the S-attribute to the tree node. That cannot be done in
	304	YYLLOC_DEFAULT, because it is invoked before the action is executed.
	305	The way I solved this was to define a macro YYACT_EPILOGUE that would
	306	be invoked after the action. For reasons of symmetry I also added
	307	YYACT_PROLOGUE. Although I had no use for that I can envision how it
	308	might come in handy for debugging purposes.
	309	All is needed is to add
	310
	311	#if YYLSP_NEEDED
	312	YYACT_EPILOGUE (yyval, (yyvsp - yylen), yylen, yyloc, (yylsp - yylen));
	313	#else
	314	YYACT_EPILOGUE (yyval, (yyvsp - yylen), yylen);
	315	#endif
	316
	317	at the proper place to bison.simple. Ditto for YYACT_PROLOGUE.
	318
	319	I was wondering what you think about adding YYACT_PROLOGUE/EPILOGUE
	320	to bison. If you're interested, I'll work on a patch.
	321
	322	* Better graphics
	323	Equip the parser with a means to create the (visual) parse tree.
	324
	325	* Complaint submessage indentation.
	326	We already have an implementation that works fairly well for named
	327	reference messages, but it would be nice to use it consistently for all
	328	submessages from Bison. For example, the "previous definition"
	329	submessage or the list of correct values for a %define variable might
	330	look better with indentation.
	331
	332	However, the current implementation makes the assumption that the
	333	location printed on the first line is not usually much shorter than the
	334	locations printed on the submessage lines that follow. That assumption
	335	may not hold true as often for some kinds of submessages especially if
	336	we ever support multiple grammar files.
	337
	338	Here's a proposal for how a new implementation might look:
	339
	340	http://lists.gnu.org/archive/html/bison-patches/2009-09/msg00086.html
	341
	342
	343	Local Variables:
	344	mode: outline
	345	coding: utf-8
	346	End:
	347
	348	-----
	349
	350	Copyright (C) 2001-2004, 2006, 2008-2012 Free Software Foundation, Inc.
	351
	352	This file is part of Bison, the GNU Compiler Compiler.
	353
	354	This program is free software: you can redistribute it and/or modify
	355	it under the terms of the GNU General Public License as published by
	356	the Free Software Foundation, either version 3 of the License, or
	357	(at your option) any later version.
	358
	359	This program is distributed in the hope that it will be useful,
	360	but WITHOUT ANY WARRANTY; without even the implied warranty of
	361	MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
	362	GNU General Public License for more details.
	363
	364	You should have received a copy of the GNU General Public License
	365	along with this program. If not, see <http://www.gnu.org/licenses/>.