git.saurik.com Git - bison.git/blame_incremental

... / ...

Commit	Line	Data
	1	* Short term
	2	** scan-code.l
	3	Avoid variables for format strings, as then GCC cannot check them.
	4	show_sub_messages should call show_sub_message.
	5
	6	** m4 names
	7	b4_shared_declarations is no longer what it is. Make it
	8	b4_parser_declaration for instance.
	9
	10	** glr.cc: %defines
	11	it should not be mandatory.
	12
	13	** stack.hh
	14	Get rid of it. The original idea is nice, but actually it makes
	15	the code harder to follow, and uselessly different from the other
	16	skeletons.
	17
	18	** Variable names.
	19	What should we name `variant' and `lex_symbol'?
	20
	21	** Get rid of fake #lines [Bison: ...]
	22	Possibly as simple as checking whether the column number is nonnegative.
	23
	24	I have seen messages like the following from GCC.
	25
	26	<built-in>:0: fatal error: opening dependency file .deps/libltdl/argz.Tpo: No such file or directory
	27
	28
	29	** Discuss about %printer/%destroy in the case of C++.
	30	It would be very nice to provide the symbol classes with an operator<<
	31	and a destructor. Unfortunately the syntax we have chosen for
	32	%destroy and %printer make them hard to reuse. For instance, the user
	33	is invited to write something like
	34
	35	%printer { debug_stream() << $$; } <my_type>;
	36
	37	which is hard to reuse elsewhere since it wants to use
	38	"debug_stream()" to find the stream to use. The same applies to
	39	%destroy: we told the user she could use the members of the Parser
	40	class in the printers/destructors, which is not good for an operator<<
	41	since it is no longer bound to a particular parser, it's just a
	42	(standalone symbol).
	43
	44	** Rename LR0.cc
	45	as lr0.cc, why upper case?
	46
	47	** bench several bisons.
	48	Enhance bench.pl with %b to run different bisons.
	49
	50	* Various
	51	** YYERRCODE
	52	Defined to 256, but not used, not documented. Probably the token
	53	number for the error token, which POSIX wants to be 256, but which
	54	Bison might renumber if the user used number 256. Keep fix and doc?
	55	Throw away?
	56
	57	Also, why don't we output the token name of the error token in the
	58	output? It is explicitly skipped:
	59
	60	/* Skip error token and tokens without identifier. */
	61	if (sym != errtoken && id)
	62
	63	Of course there are issues with name spaces, but if we disable we have
	64	something which seems to be more simpler and more consistent instead
	65	of the special case YYERRCODE.
	66
	67	enum yytokentype {
	68	error = 256,
	69	// ...
	70	};
	71
	72
	73	We could (should?) also treat the case of the undef_token, which is
	74	numbered 257 for yylex, and 2 internal. Both appear for instance in
	75	toknum:
	76
	77	const unsigned short int
	78	parser::yytoken_number_[] =
	79	{
	80	0, 256, 257, 258, 259, 260, 261, 262, 263, 264,
	81
	82	while here
	83
	84	enum yytokentype {
	85	TOK_EOF = 0,
	86	TOK_EQ = 258,
	87
	88	so both 256 and 257 are "mysterious".
	89
	90	const char*
	91	const parser::yytname_[] =
	92	{
	93	"\"end of command\"", "error", "$undefined", "\"=\"", "\"break\"",
	94
	95
	96	** yychar == yyempty_
	97	The code in yyerrlab reads:
	98
	99	if (yychar <= YYEOF)
	100	{
	101	/* Return failure if at end of input. */
	102	if (yychar == YYEOF)
	103	YYABORT;
	104	}
	105
	106	There are only two yychar that can be <= YYEOF: YYEMPTY and YYEOF.
	107	But I can't produce the situation where yychar is YYEMPTY here, is it
	108	really possible? The test suite does not exercise this case.
	109
	110	This shows that it would be interesting to manage to install skeleton
	111	coverage analysis to the test suite.
	112
	113	** Table definitions
	114	It should be very easy to factor the definition of the various tables,
	115	including the separation bw declaration and definition. See for
	116	instance b4_table_define in lalr1.cc. This way, we could even factor
	117	C vs. C++ definitions.
	118
	119	* From lalr1.cc to yacc.c
	120	** Single stack
	121	Merging the three stacks in lalr1.cc simplified the code, prompted for
	122	other improvements and also made it faster (probably because memory
	123	management is performed once instead of three times). I suggest that
	124	we do the same in yacc.c.
	125
	126	** yysyntax_error
	127	The code bw glr.c and yacc.c is really alike, we can certainly factor
	128	some parts.
	129
	130
	131	* Report
	132
	133	** Figures
	134	Some statistics about the grammar and the parser would be useful,
	135	especially when asking the user to send some information about the
	136	grammars she is working on. We should probably also include some
	137	information about the variables (I'm not sure for instance we even
	138	specify what LR variant was used).
	139
	140	** GLR
	141	How would Paul like to display the conflicted actions? In particular,
	142	what when two reductions are possible on a given lookahead token, but one is
	143	part of $default. Should we make the two reductions explicit, or just
	144	keep $default? See the following point.
	145
	146	** Disabled Reductions
	147	See `tests/conflicts.at (Defaulted Conflicted Reduction)', and decide
	148	what we want to do.
	149
	150	** Documentation
	151	Extend with error productions. The hard part will probably be finding
	152	the right rule so that a single state does not exhibit too many yet
	153	undocumented ``features''. Maybe an empty action ought to be
	154	presented too. Shall we try to make a single grammar with all these
	155	features, or should we have several very small grammars?
	156
	157	** --report=conflict-path
	158	Provide better assistance for understanding the conflicts by providing
	159	a sample text exhibiting the (LALR) ambiguity. See the paper from
	160	DeRemer and Penello: they already provide the algorithm.
	161
	162	** Statically check for potential ambiguities in GLR grammars. See
	163	<http://www.i3s.unice.fr/~schmitz/papers.html#expamb> for an approach.
	164
	165
	166	* Extensions
	167
	168	** $-1
	169	We should find a means to provide an access to values deep in the
	170	stack. For instance, instead of
	171
	172	baz: qux { $$ = $<foo>-1 + $<bar>0 + $1; }
	173
	174	we should be able to have:
	175
	176	foo($foo) bar($bar) baz($bar): qux($qux) { $baz = $foo + $bar + $qux; }
	177
	178	Or something like this.
	179
	180	** %if and the like
	181	It should be possible to have %if/%else/%endif. The implementation is
	182	not clear: should it be lexical or syntactic. Vadim Maslow thinks it
	183	must be in the scanner: we must not parse what is in a switched off
	184	part of %if. Akim Demaille thinks it should be in the parser, so as
	185	to avoid falling into another CPP mistake.
	186
	187	** XML Output
	188	There are couple of available extensions of Bison targeting some XML
	189	output. Some day we should consider including them. One issue is
	190	that they seem to be quite orthogonal to the parsing technique, and
	191	seem to depend mostly on the possibility to have some code triggered
	192	for each reduction. As a matter of fact, such hooks could also be
	193	used to generate the yydebug traces. Some generic scheme probably
	194	exists in there.
	195
	196	XML output for GNU Bison and gcc
	197	http://www.cs.may.ie/~jpower/Research/bisonXML/
	198
	199	XML output for GNU Bison
	200	http://yaxx.sourceforge.net/
	201
	202	* Unit rules
	203	Maybe we could expand unit rules, i.e., transform
	204
	205	exp: arith \| bool;
	206	arith: exp '+' exp;
	207	bool: exp '&' exp;
	208
	209	into
	210
	211	exp: exp '+' exp \| exp '&' exp;
	212
	213	when there are no actions. This can significantly speed up some
	214	grammars. I can't find the papers. In particular the book `LR
	215	parsing: Theory and Practice' is impossible to find, but according to
	216	`Parsing Techniques: a Practical Guide', it includes information about
	217	this issue. Does anybody have it?
	218
	219
	220
	221	* Documentation
	222
	223	** History/Bibliography
	224	Some history of Bison and some bibliography would be most welcome.
	225	Are there any Texinfo standards for bibliography?
	226
	227	* Coding system independence
	228	Paul notes:
	229
	230	Currently Bison assumes 8-bit bytes (i.e. that UCHAR_MAX is
	231	255). It also assumes that the 8-bit character encoding is
	232	the same for the invocation of 'bison' as it is for the
	233	invocation of 'cc', but this is not necessarily true when
	234	people run bison on an ASCII host and then use cc on an EBCDIC
	235	host. I don't think these topics are worth our time
	236	addressing (unless we find a gung-ho volunteer for EBCDIC or
	237	PDP-10 ports :-) but they should probably be documented
	238	somewhere.
	239
	240	More importantly, Bison does not currently allow NUL bytes in
	241	tokens, either via escapes (e.g., "x\0y") or via a NUL byte in
	242	the source code. This should get fixed.
	243
	244	* --graph
	245	Show reductions.
	246
	247	* Broken options ?
	248	** %token-table
	249	** Skeleton strategy
	250	Must we keep %token-table?
	251
	252	* Precedence
	253
	254	** Partial order
	255	It is unfortunate that there is a total order for precedence. It
	256	makes it impossible to have modular precedence information. We should
	257	move to partial orders (sounds like series/parallel orders to me).
	258
	259	** RR conflicts
	260	See if we can use precedence between rules to solve RR conflicts. See
	261	what POSIX says.
	262
	263
	264	* $undefined
	265	From Hans:
	266	- If the Bison generated parser experiences an undefined number in the
	267	character range, that character is written out in diagnostic messages, an
	268	addition to the $undefined value.
	269
	270	Suggest: Change the name $undefined to undefined; looks better in outputs.
	271
	272
	273	* Default Action
	274	From Hans:
	275	- For use with my C++ parser, I transported the "switch (yyn)" statement
	276	that Bison writes to the bison.simple skeleton file. This way, I can remove
	277	the current default rule $$ = $1 implementation, which causes a double
	278	assignment to $$ which may not be OK under C++, replacing it with a
	279	"default:" part within the switch statement.
	280
	281	Note that the default rule $$ = $1, when typed, is perfectly OK under C,
	282	but in the C++ implementation I made, this rule is different from
	283	$<type_name>$ = $<type_name>1. I therefore think that one should implement
	284	a Bison option where every typed default rule is explicitly written out
	285	(same typed ruled can of course be grouped together).
	286
	287	* Pre and post actions.
	288	From: Florian Krohm <florian@edamail.fishkill.ibm.com>
	289	Subject: YYACT_EPILOGUE
	290	To: bug-bison@gnu.org
	291	X-Sent: 1 week, 4 days, 14 hours, 38 minutes, 11 seconds ago
	292
	293	The other day I had the need for explicitly building the parse tree. I
	294	used %locations for that and defined YYLLOC_DEFAULT to call a function
	295	that returns the tree node for the production. Easy. But I also needed
	296	to assign the S-attribute to the tree node. That cannot be done in
	297	YYLLOC_DEFAULT, because it is invoked before the action is executed.
	298	The way I solved this was to define a macro YYACT_EPILOGUE that would
	299	be invoked after the action. For reasons of symmetry I also added
	300	YYACT_PROLOGUE. Although I had no use for that I can envision how it
	301	might come in handy for debugging purposes.
	302	All is needed is to add
	303
	304	#if YYLSP_NEEDED
	305	YYACT_EPILOGUE (yyval, (yyvsp - yylen), yylen, yyloc, (yylsp - yylen));
	306	#else
	307	YYACT_EPILOGUE (yyval, (yyvsp - yylen), yylen);
	308	#endif
	309
	310	at the proper place to bison.simple. Ditto for YYACT_PROLOGUE.
	311
	312	I was wondering what you think about adding YYACT_PROLOGUE/EPILOGUE
	313	to bison. If you're interested, I'll work on a patch.
	314
	315	* Better graphics
	316	Equip the parser with a means to create the (visual) parse tree.
	317
	318	* Complaint submessage indentation.
	319	We already have an implementation that works fairly well for named
	320	reference messages, but it would be nice to use it consistently for all
	321	submessages from Bison. For example, the "previous definition"
	322	submessage or the list of correct values for a %define variable might
	323	look better with indentation.
	324
	325	However, the current implementation makes the assumption that the
	326	location printed on the first line is not usually much shorter than the
	327	locations printed on the submessage lines that follow. That assumption
	328	may not hold true as often for some kinds of submessages especially if
	329	we ever support multiple grammar files.
	330
	331	Here's a proposal for how a new implementation might look:
	332
	333	http://lists.gnu.org/archive/html/bison-patches/2009-09/msg00086.html
	334
	335
	336	Local Variables:
	337	mode: outline
	338	coding: utf-8
	339	End:
	340
	341	-----
	342
	343	Copyright (C) 2001-2004, 2006, 2008-2012 Free Software Foundation, Inc.
	344
	345	This file is part of Bison, the GNU Compiler Compiler.
	346
	347	This program is free software: you can redistribute it and/or modify
	348	it under the terms of the GNU General Public License as published by
	349	the Free Software Foundation, either version 3 of the License, or
	350	(at your option) any later version.
	351
	352	This program is distributed in the hope that it will be useful,
	353	but WITHOUT ANY WARRANTY; without even the implied warranty of
	354	MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
	355	GNU General Public License for more details.
	356
	357	You should have received a copy of the GNU General Public License
	358	along with this program. If not, see <http://www.gnu.org/licenses/>.