git.saurik.com Git - bison.git/blame_incremental

... / ...

Commit	Line	Data
	1	* Short term
	2	** push-parser
	3	Check it too when checking the different kinds of parsers. And be
	4	sure to check that the initial-action is performed once per parsing.
	5
	6	** m4 names
	7	b4_shared_declarations is no longer what it is. Make it
	8	b4_parser_declaration for instance.
	9
	10	** $ and others in epilogue
	11	A stray $ is a warning in the actions, but an error in the epilogue.
	12	IMHO, it should not even be a warning in the epilogue.
	13
	14	** stack.hh
	15	Get rid of it. The original idea is nice, but actually it makes
	16	the code harder to follow, and uselessly different from the other
	17	skeletons.
	18
	19	** Variable names.
	20	What should we name `variant' and `lex_symbol'?
	21
	22	** Get rid of fake #lines [Bison: ...]
	23	Possibly as simple as checking whether the column number is nonnegative.
	24
	25	I have seen messages like the following from GCC.
	26
	27	<built-in>:0: fatal error: opening dependency file .deps/libltdl/argz.Tpo: No such file or directory
	28
	29
	30	** Discuss about %printer/%destroy in the case of C++.
	31	It would be very nice to provide the symbol classes with an operator<<
	32	and a destructor. Unfortunately the syntax we have chosen for
	33	%destroy and %printer make them hard to reuse. For instance, the user
	34	is invited to write something like
	35
	36	%printer { debug_stream() << $$; } <my_type>;
	37
	38	which is hard to reuse elsewhere since it wants to use
	39	"debug_stream()" to find the stream to use. The same applies to
	40	%destroy: we told the user she could use the members of the Parser
	41	class in the printers/destructors, which is not good for an operator<<
	42	since it is no longer bound to a particular parser, it's just a
	43	(standalone symbol).
	44
	45	** Rename LR0.cc
	46	as lr0.cc, why upper case?
	47
	48	** bench several bisons.
	49	Enhance bench.pl with %b to run different bisons.
	50
	51	* Various
	52	** YYERRCODE
	53	Defined to 256, but not used, not documented. Probably the token
	54	number for the error token, which POSIX wants to be 256, but which
	55	Bison might renumber if the user used number 256. Keep fix and doc?
	56	Throw away?
	57
	58	Also, why don't we output the token name of the error token in the
	59	output? It is explicitly skipped:
	60
	61	/* Skip error token and tokens without identifier. */
	62	if (sym != errtoken && id)
	63
	64	Of course there are issues with name spaces, but if we disable we have
	65	something which seems to be more simpler and more consistent instead
	66	of the special case YYERRCODE.
	67
	68	enum yytokentype {
	69	error = 256,
	70	// ...
	71	};
	72
	73
	74	We could (should?) also treat the case of the undef_token, which is
	75	numbered 257 for yylex, and 2 internal. Both appear for instance in
	76	toknum:
	77
	78	const unsigned short int
	79	parser::yytoken_number_[] =
	80	{
	81	0, 256, 257, 258, 259, 260, 261, 262, 263, 264,
	82
	83	while here
	84
	85	enum yytokentype {
	86	TOK_EOF = 0,
	87	TOK_EQ = 258,
	88
	89	so both 256 and 257 are "mysterious".
	90
	91	const char*
	92	const parser::yytname_[] =
	93	{
	94	"\"end of command\"", "error", "$undefined", "\"=\"", "\"break\"",
	95
	96
	97	** yychar == yyempty_
	98	The code in yyerrlab reads:
	99
	100	if (yychar <= YYEOF)
	101	{
	102	/* Return failure if at end of input. */
	103	if (yychar == YYEOF)
	104	YYABORT;
	105	}
	106
	107	There are only two yychar that can be <= YYEOF: YYEMPTY and YYEOF.
	108	But I can't produce the situation where yychar is YYEMPTY here, is it
	109	really possible? The test suite does not exercise this case.
	110
	111	This shows that it would be interesting to manage to install skeleton
	112	coverage analysis to the test suite.
	113
	114	** Table definitions
	115	It should be very easy to factor the definition of the various tables,
	116	including the separation bw declaration and definition. See for
	117	instance b4_table_define in lalr1.cc. This way, we could even factor
	118	C vs. C++ definitions.
	119
	120	* From lalr1.cc to yacc.c
	121	** Single stack
	122	Merging the three stacks in lalr1.cc simplified the code, prompted for
	123	other improvements and also made it faster (probably because memory
	124	management is performed once instead of three times). I suggest that
	125	we do the same in yacc.c.
	126
	127	** yysyntax_error
	128	The code bw glr.c and yacc.c is really alike, we can certainly factor
	129	some parts.
	130
	131
	132	* Report
	133
	134	** Figures
	135	Some statistics about the grammar and the parser would be useful,
	136	especially when asking the user to send some information about the
	137	grammars she is working on. We should probably also include some
	138	information about the variables (I'm not sure for instance we even
	139	specify what LR variant was used).
	140
	141	** GLR
	142	How would Paul like to display the conflicted actions? In particular,
	143	what when two reductions are possible on a given lookahead token, but one is
	144	part of $default. Should we make the two reductions explicit, or just
	145	keep $default? See the following point.
	146
	147	** Disabled Reductions
	148	See `tests/conflicts.at (Defaulted Conflicted Reduction)', and decide
	149	what we want to do.
	150
	151	** Documentation
	152	Extend with error productions. The hard part will probably be finding
	153	the right rule so that a single state does not exhibit too many yet
	154	undocumented ``features''. Maybe an empty action ought to be
	155	presented too. Shall we try to make a single grammar with all these
	156	features, or should we have several very small grammars?
	157
	158	** --report=conflict-path
	159	Provide better assistance for understanding the conflicts by providing
	160	a sample text exhibiting the (LALR) ambiguity. See the paper from
	161	DeRemer and Penello: they already provide the algorithm.
	162
	163	** Statically check for potential ambiguities in GLR grammars. See
	164	<http://www.i3s.unice.fr/~schmitz/papers.html#expamb> for an approach.
	165
	166
	167	* Extensions
	168
	169	** $-1
	170	We should find a means to provide an access to values deep in the
	171	stack. For instance, instead of
	172
	173	baz: qux { $$ = $<foo>-1 + $<bar>0 + $1; }
	174
	175	we should be able to have:
	176
	177	foo($foo) bar($bar) baz($bar): qux($qux) { $baz = $foo + $bar + $qux; }
	178
	179	Or something like this.
	180
	181	** %if and the like
	182	It should be possible to have %if/%else/%endif. The implementation is
	183	not clear: should it be lexical or syntactic. Vadim Maslow thinks it
	184	must be in the scanner: we must not parse what is in a switched off
	185	part of %if. Akim Demaille thinks it should be in the parser, so as
	186	to avoid falling into another CPP mistake.
	187
	188	** XML Output
	189	There are couple of available extensions of Bison targeting some XML
	190	output. Some day we should consider including them. One issue is
	191	that they seem to be quite orthogonal to the parsing technique, and
	192	seem to depend mostly on the possibility to have some code triggered
	193	for each reduction. As a matter of fact, such hooks could also be
	194	used to generate the yydebug traces. Some generic scheme probably
	195	exists in there.
	196
	197	XML output for GNU Bison and gcc
	198	http://www.cs.may.ie/~jpower/Research/bisonXML/
	199
	200	XML output for GNU Bison
	201	http://yaxx.sourceforge.net/
	202
	203	* Unit rules
	204	Maybe we could expand unit rules, i.e., transform
	205
	206	exp: arith \| bool;
	207	arith: exp '+' exp;
	208	bool: exp '&' exp;
	209
	210	into
	211
	212	exp: exp '+' exp \| exp '&' exp;
	213
	214	when there are no actions. This can significantly speed up some
	215	grammars. I can't find the papers. In particular the book `LR
	216	parsing: Theory and Practice' is impossible to find, but according to
	217	`Parsing Techniques: a Practical Guide', it includes information about
	218	this issue. Does anybody have it?
	219
	220
	221
	222	* Documentation
	223
	224	** History/Bibliography
	225	Some history of Bison and some bibliography would be most welcome.
	226	Are there any Texinfo standards for bibliography?
	227
	228	* Coding system independence
	229	Paul notes:
	230
	231	Currently Bison assumes 8-bit bytes (i.e. that UCHAR_MAX is
	232	255). It also assumes that the 8-bit character encoding is
	233	the same for the invocation of 'bison' as it is for the
	234	invocation of 'cc', but this is not necessarily true when
	235	people run bison on an ASCII host and then use cc on an EBCDIC
	236	host. I don't think these topics are worth our time
	237	addressing (unless we find a gung-ho volunteer for EBCDIC or
	238	PDP-10 ports :-) but they should probably be documented
	239	somewhere.
	240
	241	More importantly, Bison does not currently allow NUL bytes in
	242	tokens, either via escapes (e.g., "x\0y") or via a NUL byte in
	243	the source code. This should get fixed.
	244
	245	* --graph
	246	Show reductions.
	247
	248	* Broken options ?
	249	** %token-table
	250	** Skeleton strategy
	251	Must we keep %token-table?
	252
	253	* Precedence
	254
	255	** Partial order
	256	It is unfortunate that there is a total order for precedence. It
	257	makes it impossible to have modular precedence information. We should
	258	move to partial orders (sounds like series/parallel orders to me).
	259
	260	** RR conflicts
	261	See if we can use precedence between rules to solve RR conflicts. See
	262	what POSIX says.
	263
	264
	265	* $undefined
	266	From Hans:
	267	- If the Bison generated parser experiences an undefined number in the
	268	character range, that character is written out in diagnostic messages, an
	269	addition to the $undefined value.
	270
	271	Suggest: Change the name $undefined to undefined; looks better in outputs.
	272
	273
	274	* Default Action
	275	From Hans:
	276	- For use with my C++ parser, I transported the "switch (yyn)" statement
	277	that Bison writes to the bison.simple skeleton file. This way, I can remove
	278	the current default rule $$ = $1 implementation, which causes a double
	279	assignment to $$ which may not be OK under C++, replacing it with a
	280	"default:" part within the switch statement.
	281
	282	Note that the default rule $$ = $1, when typed, is perfectly OK under C,
	283	but in the C++ implementation I made, this rule is different from
	284	$<type_name>$ = $<type_name>1. I therefore think that one should implement
	285	a Bison option where every typed default rule is explicitly written out
	286	(same typed ruled can of course be grouped together).
	287
	288	* Pre and post actions.
	289	From: Florian Krohm <florian@edamail.fishkill.ibm.com>
	290	Subject: YYACT_EPILOGUE
	291	To: bug-bison@gnu.org
	292	X-Sent: 1 week, 4 days, 14 hours, 38 minutes, 11 seconds ago
	293
	294	The other day I had the need for explicitly building the parse tree. I
	295	used %locations for that and defined YYLLOC_DEFAULT to call a function
	296	that returns the tree node for the production. Easy. But I also needed
	297	to assign the S-attribute to the tree node. That cannot be done in
	298	YYLLOC_DEFAULT, because it is invoked before the action is executed.
	299	The way I solved this was to define a macro YYACT_EPILOGUE that would
	300	be invoked after the action. For reasons of symmetry I also added
	301	YYACT_PROLOGUE. Although I had no use for that I can envision how it
	302	might come in handy for debugging purposes.
	303	All is needed is to add
	304
	305	#if YYLSP_NEEDED
	306	YYACT_EPILOGUE (yyval, (yyvsp - yylen), yylen, yyloc, (yylsp - yylen));
	307	#else
	308	YYACT_EPILOGUE (yyval, (yyvsp - yylen), yylen);
	309	#endif
	310
	311	at the proper place to bison.simple. Ditto for YYACT_PROLOGUE.
	312
	313	I was wondering what you think about adding YYACT_PROLOGUE/EPILOGUE
	314	to bison. If you're interested, I'll work on a patch.
	315
	316	* Better graphics
	317	Equip the parser with a means to create the (visual) parse tree.
	318
	319	* Complaint submessage indentation.
	320	We already have an implementation that works fairly well for named
	321	reference messages, but it would be nice to use it consistently for all
	322	submessages from Bison. For example, the "previous definition"
	323	submessage or the list of correct values for a %define variable might
	324	look better with indentation.
	325
	326	However, the current implementation makes the assumption that the
	327	location printed on the first line is not usually much shorter than the
	328	locations printed on the submessage lines that follow. That assumption
	329	may not hold true as often for some kinds of submessages especially if
	330	we ever support multiple grammar files.
	331
	332	Here's a proposal for how a new implementation might look:
	333
	334	http://lists.gnu.org/archive/html/bison-patches/2009-09/msg00086.html
	335
	336
	337	Local Variables:
	338	mode: outline
	339	coding: utf-8
	340	End:
	341
	342	-----
	343
	344	Copyright (C) 2001-2004, 2006, 2008-2012 Free Software Foundation, Inc.
	345
	346	This file is part of Bison, the GNU Compiler Compiler.
	347
	348	This program is free software: you can redistribute it and/or modify
	349	it under the terms of the GNU General Public License as published by
	350	the Free Software Foundation, either version 3 of the License, or
	351	(at your option) any later version.
	352
	353	This program is distributed in the hope that it will be useful,
	354	but WITHOUT ANY WARRANTY; without even the implied warranty of
	355	MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
	356	GNU General Public License for more details.
	357
	358	You should have received a copy of the GNU General Public License
	359	along with this program. If not, see <http://www.gnu.org/licenses/>.