git.saurik.com Git - bison.git/blame_incremental

... / ...

Commit	Line	Data
	1	* Short term
	2	** Variable names.
	3	What should we name `variant' and `lex_symbol'?
	4
	5	** Use b4_symbol in all the skeleton
	6	Move its definition in the more standard places and deploy it in other
	7	skeletons. Then remove the older system, including the tables
	8	generated by output.c
	9
	10	** Update the documentation on gnu.org
	11
	12	** Get rid of fake #lines [Bison: ...]
	13	Possibly as simple as checking whether the column number is nonnegative.
	14
	15	I have seen messages like the following from GCC.
	16
	17	<built-in>:0: fatal error: opening dependency file .deps/libltdl/argz.Tpo: No such file or directory
	18
	19
	20	** Discuss about %printer/%destroy in the case of C++.
	21	It would be very nice to provide the symbol classes with an operator<<
	22	and a destructor. Unfortunately the syntax we have chosen for
	23	%destroy and %printer make them hard to reuse. For instance, the user
	24	is invited to write something like
	25
	26	%printer { debug_stream() << $$; } <my_type>;
	27
	28	which is hard to reuse elsewhere since it wants to use
	29	"debug_stream()" to find the stream to use. The same applies to
	30	%destroy: we told the user she could use the members of the Parser
	31	class in the printers/destructors, which is not good for an operator<<
	32	since it is no longer bound to a particular parser, it's just a
	33	(standalone symbol).
	34
	35	** Rename LR0.cc
	36	as lr0.cc, why upper case?
	37
	38	** bench several bisons.
	39	Enhance bench.pl with %b to run different bisons.
	40
	41	* Various
	42	** Warnings
	43	Warnings about type tags that are used in printer and dtors, but not
	44	for symbols?
	45
	46	** YYERRCODE
	47	Defined to 256, but not used, not documented. Probably the token
	48	number for the error token, which POSIX wants to be 256, but which
	49	Bison might renumber if the user used number 256. Keep fix and doc?
	50	Throw away?
	51
	52	Also, why don't we output the token name of the error token in the
	53	output? It is explicitly skipped:
	54
	55	/* Skip error token and tokens without identifier. */
	56	if (sym != errtoken && id)
	57
	58	Of course there are issues with name spaces, but if we disable we have
	59	something which seems to be more simpler and more consistent instead
	60	of the special case YYERRCODE.
	61
	62	enum yytokentype {
	63	error = 256,
	64	// ...
	65	};
	66
	67
	68	We could (should?) also treat the case of the undef_token, which is
	69	numbered 257 for yylex, and 2 internal. Both appear for instance in
	70	toknum:
	71
	72	const unsigned short int
	73	parser::yytoken_number_[] =
	74	{
	75	0, 256, 257, 258, 259, 260, 261, 262, 263, 264,
	76
	77	while here
	78
	79	enum yytokentype {
	80	TOK_EOF = 0,
	81	TOK_EQ = 258,
	82
	83	so both 256 and 257 are "mysterious".
	84
	85	const char*
	86	const parser::yytname_[] =
	87	{
	88	"\"end of command\"", "error", "$undefined", "\"=\"", "\"break\"",
	89
	90
	91	** YYFAIL
	92	It is seems to be really obsolete now, shall we remove it?
	93
	94	** yychar == yyempty_
	95	The code in yyerrlab reads:
	96
	97	if (yychar <= YYEOF)
	98	{
	99	/* Return failure if at end of input. */
	100	if (yychar == YYEOF)
	101	YYABORT;
	102	}
	103
	104	There are only two yychar that can be <= YYEOF: YYEMPTY and YYEOF.
	105	But I can't produce the situation where yychar is YYEMPTY here, is it
	106	really possible? The test suite does not exercise this case.
	107
	108	This shows that it would be interesting to manage to install skeleton
	109	coverage analysis to the test suite.
	110
	111	** Table definitions
	112	It should be very easy to factor the definition of the various tables,
	113	including the separation bw declaration and definition. See for
	114	instance b4_table_define in lalr1.cc. This way, we could even factor
	115	C vs. C++ definitions.
	116
	117	* From lalr1.cc to yacc.c
	118	** Single stack
	119	Merging the three stacks in lalr1.cc simplified the code, prompted for
	120	other improvements and also made it faster (probably because memory
	121	management is performed once instead of three times). I suggest that
	122	we do the same in yacc.c.
	123
	124	** yysyntax_error
	125	The code bw glr.c and yacc.c is really alike, we can certainly factor
	126	some parts.
	127
	128
	129	* Yacc.c: CPP Macros
	130
	131	Do some people use YYPURE, YYLSP_NEEDED like we do in the test suite?
	132	They should not: it is not documented. But if they need to, let's
	133	find something clean (not like YYLSP_NEEDED...).
	134
	135	* Report
	136
	137	** Figures
	138	Some statistics about the grammar and the parser would be useful,
	139	especially when asking the user to send some information about the
	140	grammars she is working on. We should probably also include some
	141	information about the variables (I'm not sure for instance we even
	142	specify what LR variant was used).
	143
	144	** GLR
	145	How would Paul like to display the conflicted actions? In particular,
	146	what when two reductions are possible on a given lookahead token, but one is
	147	part of $default. Should we make the two reductions explicit, or just
	148	keep $default? See the following point.
	149
	150	** Disabled Reductions
	151	See `tests/conflicts.at (Defaulted Conflicted Reduction)', and decide
	152	what we want to do.
	153
	154	** Documentation
	155	Extend with error productions. The hard part will probably be finding
	156	the right rule so that a single state does not exhibit too many yet
	157	undocumented ``features''. Maybe an empty action ought to be
	158	presented too. Shall we try to make a single grammar with all these
	159	features, or should we have several very small grammars?
	160
	161	** --report=conflict-path
	162	Provide better assistance for understanding the conflicts by providing
	163	a sample text exhibiting the (LALR) ambiguity. See the paper from
	164	DeRemer and Penello: they already provide the algorithm.
	165
	166	** Statically check for potential ambiguities in GLR grammars. See
	167	<http://www.i3s.unice.fr/~schmitz/papers.html#expamb> for an approach.
	168
	169
	170	* Extensions
	171
	172	** $-1
	173	We should find a means to provide an access to values deep in the
	174	stack. For instance, instead of
	175
	176	baz: qux { $$ = $<foo>-1 + $<bar>0 + $1; }
	177
	178	we should be able to have:
	179
	180	foo($foo) bar($bar) baz($bar): qux($qux) { $baz = $foo + $bar + $qux; }
	181
	182	Or something like this.
	183
	184	** %if and the like
	185	It should be possible to have %if/%else/%endif. The implementation is
	186	not clear: should it be lexical or syntactic. Vadim Maslow thinks it
	187	must be in the scanner: we must not parse what is in a switched off
	188	part of %if. Akim Demaille thinks it should be in the parser, so as
	189	to avoid falling into another CPP mistake.
	190
	191	** XML Output
	192	There are couple of available extensions of Bison targeting some XML
	193	output. Some day we should consider including them. One issue is
	194	that they seem to be quite orthogonal to the parsing technique, and
	195	seem to depend mostly on the possibility to have some code triggered
	196	for each reduction. As a matter of fact, such hooks could also be
	197	used to generate the yydebug traces. Some generic scheme probably
	198	exists in there.
	199
	200	XML output for GNU Bison and gcc
	201	http://www.cs.may.ie/~jpower/Research/bisonXML/
	202
	203	XML output for GNU Bison
	204	http://yaxx.sourceforge.net/
	205
	206	* Unit rules
	207	Maybe we could expand unit rules, i.e., transform
	208
	209	exp: arith \| bool;
	210	arith: exp '+' exp;
	211	bool: exp '&' exp;
	212
	213	into
	214
	215	exp: exp '+' exp \| exp '&' exp;
	216
	217	when there are no actions. This can significantly speed up some
	218	grammars. I can't find the papers. In particular the book `LR
	219	parsing: Theory and Practice' is impossible to find, but according to
	220	`Parsing Techniques: a Practical Guide', it includes information about
	221	this issue. Does anybody have it?
	222
	223
	224
	225	* Documentation
	226
	227	** History/Bibliography
	228	Some history of Bison and some bibliography would be most welcome.
	229	Are there any Texinfo standards for bibliography?
	230
	231	* Coding system independence
	232	Paul notes:
	233
	234	Currently Bison assumes 8-bit bytes (i.e. that UCHAR_MAX is
	235	255). It also assumes that the 8-bit character encoding is
	236	the same for the invocation of 'bison' as it is for the
	237	invocation of 'cc', but this is not necessarily true when
	238	people run bison on an ASCII host and then use cc on an EBCDIC
	239	host. I don't think these topics are worth our time
	240	addressing (unless we find a gung-ho volunteer for EBCDIC or
	241	PDP-10 ports :-) but they should probably be documented
	242	somewhere.
	243
	244	More importantly, Bison does not currently allow NUL bytes in
	245	tokens, either via escapes (e.g., "x\0y") or via a NUL byte in
	246	the source code. This should get fixed.
	247
	248	* --graph
	249	Show reductions.
	250
	251	* Broken options ?
	252	** %token-table
	253	** Skeleton strategy
	254	Must we keep %token-table?
	255
	256	* Precedence
	257
	258	** Partial order
	259	It is unfortunate that there is a total order for precedence. It
	260	makes it impossible to have modular precedence information. We should
	261	move to partial orders (sounds like series/parallel orders to me).
	262
	263	** RR conflicts
	264	See if we can use precedence between rules to solve RR conflicts. See
	265	what POSIX says.
	266
	267
	268	* $undefined
	269	From Hans:
	270	- If the Bison generated parser experiences an undefined number in the
	271	character range, that character is written out in diagnostic messages, an
	272	addition to the $undefined value.
	273
	274	Suggest: Change the name $undefined to undefined; looks better in outputs.
	275
	276
	277	* Default Action
	278	From Hans:
	279	- For use with my C++ parser, I transported the "switch (yyn)" statement
	280	that Bison writes to the bison.simple skeleton file. This way, I can remove
	281	the current default rule $$ = $1 implementation, which causes a double
	282	assignment to $$ which may not be OK under C++, replacing it with a
	283	"default:" part within the switch statement.
	284
	285	Note that the default rule $$ = $1, when typed, is perfectly OK under C,
	286	but in the C++ implementation I made, this rule is different from
	287	$<type_name>$ = $<type_name>1. I therefore think that one should implement
	288	a Bison option where every typed default rule is explicitly written out
	289	(same typed ruled can of course be grouped together).
	290
	291	* Pre and post actions.
	292	From: Florian Krohm <florian@edamail.fishkill.ibm.com>
	293	Subject: YYACT_EPILOGUE
	294	To: bug-bison@gnu.org
	295	X-Sent: 1 week, 4 days, 14 hours, 38 minutes, 11 seconds ago
	296
	297	The other day I had the need for explicitly building the parse tree. I
	298	used %locations for that and defined YYLLOC_DEFAULT to call a function
	299	that returns the tree node for the production. Easy. But I also needed
	300	to assign the S-attribute to the tree node. That cannot be done in
	301	YYLLOC_DEFAULT, because it is invoked before the action is executed.
	302	The way I solved this was to define a macro YYACT_EPILOGUE that would
	303	be invoked after the action. For reasons of symmetry I also added
	304	YYACT_PROLOGUE. Although I had no use for that I can envision how it
	305	might come in handy for debugging purposes.
	306	All is needed is to add
	307
	308	#if YYLSP_NEEDED
	309	YYACT_EPILOGUE (yyval, (yyvsp - yylen), yylen, yyloc, (yylsp - yylen));
	310	#else
	311	YYACT_EPILOGUE (yyval, (yyvsp - yylen), yylen);
	312	#endif
	313
	314	at the proper place to bison.simple. Ditto for YYACT_PROLOGUE.
	315
	316	I was wondering what you think about adding YYACT_PROLOGUE/EPILOGUE
	317	to bison. If you're interested, I'll work on a patch.
	318
	319	* Better graphics
	320	Equip the parser with a means to create the (visual) parse tree.
	321
	322	* Complaint submessage indentation.
	323	We already have an implementation that works fairly well for named
	324	reference messages, but it would be nice to use it consistently for all
	325	submessages from Bison. For example, the "previous definition"
	326	submessage or the list of correct values for a %define variable might
	327	look better with indentation.
	328
	329	However, the current implementation makes the assumption that the
	330	location printed on the first line is not usually much shorter than the
	331	locations printed on the submessage lines that follow. That assumption
	332	may not hold true as often for some kinds of submessages especially if
	333	we ever support multiple grammar files.
	334
	335	Here's a proposal for how a new implementation might look:
	336
	337	http://lists.gnu.org/archive/html/bison-patches/2009-09/msg00086.html
	338
	339
	340	Local Variables:
	341	mode: outline
	342	coding: utf-8
	343	End:
	344
	345	-----
	346
	347	Copyright (C) 2001-2004, 2006, 2008-2012 Free Software Foundation, Inc.
	348
	349	This file is part of Bison, the GNU Compiler Compiler.
	350
	351	This program is free software: you can redistribute it and/or modify
	352	it under the terms of the GNU General Public License as published by
	353	the Free Software Foundation, either version 3 of the License, or
	354	(at your option) any later version.
	355
	356	This program is distributed in the hope that it will be useful,
	357	but WITHOUT ANY WARRANTY; without even the implied warranty of
	358	MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
	359	GNU General Public License for more details.
	360
	361	You should have received a copy of the GNU General Public License
	362	along with this program. If not, see <http://www.gnu.org/licenses/>.