git.saurik.com Git - bison.git/blame_incremental

... / ...

Commit	Line	Data
	1	* Short term
	2	** Graphviz display code thoughts
	3	The code for the --graph option is over two files: print_graph, and
	4	graphviz. I believe this is because Bison used to also produce VCG graphs,
	5	but since this is no longer true, maybe we could consider these files for
	6	fusion.
	7
	8	Little effort factoring seems to have been given to factoring in these files,
	9	and their print-xml and print counterpart. We would very much like to re-use
	10	the pretty format of states from .output in the .dot
	11
	12	Also, the underscore in print_graph.[ch] isn't very fitting considering
	13	the dashes in the other filenames.
	14
	15	** Variable names.
	16	What should we name `variant' and `lex_symbol'?
	17
	18	** Use b4_symbol in all the skeleton
	19	Move its definition in the more standard places and deploy it in other
	20	skeletons. Then remove the older system, including the tables
	21	generated by output.c
	22
	23	** Update the documentation on gnu.org
	24
	25	** Get rid of fake #lines [Bison: ...]
	26	Possibly as simple as checking whether the column number is nonnegative.
	27
	28	I have seen messages like the following from GCC.
	29
	30	<built-in>:0: fatal error: opening dependency file .deps/libltdl/argz.Tpo: No such file or directory
	31
	32
	33	** Discuss about %printer/%destroy in the case of C++.
	34	It would be very nice to provide the symbol classes with an operator<<
	35	and a destructor. Unfortunately the syntax we have chosen for
	36	%destroy and %printer make them hard to reuse. For instance, the user
	37	is invited to write something like
	38
	39	%printer { debug_stream() << $$; } <my_type>;
	40
	41	which is hard to reuse elsewhere since it wants to use
	42	"debug_stream()" to find the stream to use. The same applies to
	43	%destroy: we told the user she could use the members of the Parser
	44	class in the printers/destructors, which is not good for an operator<<
	45	since it is no longer bound to a particular parser, it's just a
	46	(standalone symbol).
	47
	48	** Rename LR0.cc
	49	as lr0.cc, why upper case?
	50
	51	** bench several bisons.
	52	Enhance bench.pl with %b to run different bisons.
	53
	54	* Various
	55	** Warnings
	56	Warnings about type tags that are used in printer and dtors, but not
	57	for symbols?
	58
	59	** YYERRCODE
	60	Defined to 256, but not used, not documented. Probably the token
	61	number for the error token, which POSIX wants to be 256, but which
	62	Bison might renumber if the user used number 256. Keep fix and doc?
	63	Throw away?
	64
	65	Also, why don't we output the token name of the error token in the
	66	output? It is explicitly skipped:
	67
	68	/* Skip error token and tokens without identifier. */
	69	if (sym != errtoken && id)
	70
	71	Of course there are issues with name spaces, but if we disable we have
	72	something which seems to be more simpler and more consistent instead
	73	of the special case YYERRCODE.
	74
	75	enum yytokentype {
	76	error = 256,
	77	// ...
	78	};
	79
	80
	81	We could (should?) also treat the case of the undef_token, which is
	82	numbered 257 for yylex, and 2 internal. Both appear for instance in
	83	toknum:
	84
	85	const unsigned short int
	86	parser::yytoken_number_[] =
	87	{
	88	0, 256, 257, 258, 259, 260, 261, 262, 263, 264,
	89
	90	while here
	91
	92	enum yytokentype {
	93	TOK_EOF = 0,
	94	TOK_EQ = 258,
	95
	96	so both 256 and 257 are "mysterious".
	97
	98	const char*
	99	const parser::yytname_[] =
	100	{
	101	"\"end of command\"", "error", "$undefined", "\"=\"", "\"break\"",
	102
	103
	104	** YYFAIL
	105	It is seems to be really obsolete now, shall we remove it?
	106
	107	** yychar == yyempty_
	108	The code in yyerrlab reads:
	109
	110	if (yychar <= YYEOF)
	111	{
	112	/* Return failure if at end of input. */
	113	if (yychar == YYEOF)
	114	YYABORT;
	115	}
	116
	117	There are only two yychar that can be <= YYEOF: YYEMPTY and YYEOF.
	118	But I can't produce the situation where yychar is YYEMPTY here, is it
	119	really possible? The test suite does not exercise this case.
	120
	121	This shows that it would be interesting to manage to install skeleton
	122	coverage analysis to the test suite.
	123
	124	** Table definitions
	125	It should be very easy to factor the definition of the various tables,
	126	including the separation bw declaration and definition. See for
	127	instance b4_table_define in lalr1.cc. This way, we could even factor
	128	C vs. C++ definitions.
	129
	130	* From lalr1.cc to yacc.c
	131	** Single stack
	132	Merging the three stacks in lalr1.cc simplified the code, prompted for
	133	other improvements and also made it faster (probably because memory
	134	management is performed once instead of three times). I suggest that
	135	we do the same in yacc.c.
	136
	137	** yysyntax_error
	138	The code bw glr.c and yacc.c is really alike, we can certainly factor
	139	some parts.
	140
	141
	142	* Report
	143
	144	** Figures
	145	Some statistics about the grammar and the parser would be useful,
	146	especially when asking the user to send some information about the
	147	grammars she is working on. We should probably also include some
	148	information about the variables (I'm not sure for instance we even
	149	specify what LR variant was used).
	150
	151	** GLR
	152	How would Paul like to display the conflicted actions? In particular,
	153	what when two reductions are possible on a given lookahead token, but one is
	154	part of $default. Should we make the two reductions explicit, or just
	155	keep $default? See the following point.
	156
	157	** Disabled Reductions
	158	See `tests/conflicts.at (Defaulted Conflicted Reduction)', and decide
	159	what we want to do.
	160
	161	** Documentation
	162	Extend with error productions. The hard part will probably be finding
	163	the right rule so that a single state does not exhibit too many yet
	164	undocumented ``features''. Maybe an empty action ought to be
	165	presented too. Shall we try to make a single grammar with all these
	166	features, or should we have several very small grammars?
	167
	168	** --report=conflict-path
	169	Provide better assistance for understanding the conflicts by providing
	170	a sample text exhibiting the (LALR) ambiguity. See the paper from
	171	DeRemer and Penello: they already provide the algorithm.
	172
	173	** Statically check for potential ambiguities in GLR grammars. See
	174	<http://www.i3s.unice.fr/~schmitz/papers.html#expamb> for an approach.
	175
	176
	177	* Extensions
	178
	179	** $-1
	180	We should find a means to provide an access to values deep in the
	181	stack. For instance, instead of
	182
	183	baz: qux { $$ = $<foo>-1 + $<bar>0 + $1; }
	184
	185	we should be able to have:
	186
	187	foo($foo) bar($bar) baz($bar): qux($qux) { $baz = $foo + $bar + $qux; }
	188
	189	Or something like this.
	190
	191	** %if and the like
	192	It should be possible to have %if/%else/%endif. The implementation is
	193	not clear: should it be lexical or syntactic. Vadim Maslow thinks it
	194	must be in the scanner: we must not parse what is in a switched off
	195	part of %if. Akim Demaille thinks it should be in the parser, so as
	196	to avoid falling into another CPP mistake.
	197
	198	** XML Output
	199	There are couple of available extensions of Bison targeting some XML
	200	output. Some day we should consider including them. One issue is
	201	that they seem to be quite orthogonal to the parsing technique, and
	202	seem to depend mostly on the possibility to have some code triggered
	203	for each reduction. As a matter of fact, such hooks could also be
	204	used to generate the yydebug traces. Some generic scheme probably
	205	exists in there.
	206
	207	XML output for GNU Bison and gcc
	208	http://www.cs.may.ie/~jpower/Research/bisonXML/
	209
	210	XML output for GNU Bison
	211	http://yaxx.sourceforge.net/
	212
	213	* Unit rules
	214	Maybe we could expand unit rules, i.e., transform
	215
	216	exp: arith \| bool;
	217	arith: exp '+' exp;
	218	bool: exp '&' exp;
	219
	220	into
	221
	222	exp: exp '+' exp \| exp '&' exp;
	223
	224	when there are no actions. This can significantly speed up some
	225	grammars. I can't find the papers. In particular the book `LR
	226	parsing: Theory and Practice' is impossible to find, but according to
	227	`Parsing Techniques: a Practical Guide', it includes information about
	228	this issue. Does anybody have it?
	229
	230
	231
	232	* Documentation
	233
	234	** History/Bibliography
	235	Some history of Bison and some bibliography would be most welcome.
	236	Are there any Texinfo standards for bibliography?
	237
	238	* Coding system independence
	239	Paul notes:
	240
	241	Currently Bison assumes 8-bit bytes (i.e. that UCHAR_MAX is
	242	255). It also assumes that the 8-bit character encoding is
	243	the same for the invocation of 'bison' as it is for the
	244	invocation of 'cc', but this is not necessarily true when
	245	people run bison on an ASCII host and then use cc on an EBCDIC
	246	host. I don't think these topics are worth our time
	247	addressing (unless we find a gung-ho volunteer for EBCDIC or
	248	PDP-10 ports :-) but they should probably be documented
	249	somewhere.
	250
	251	More importantly, Bison does not currently allow NUL bytes in
	252	tokens, either via escapes (e.g., "x\0y") or via a NUL byte in
	253	the source code. This should get fixed.
	254
	255	* --graph
	256	Show reductions.
	257
	258	* Broken options ?
	259	** %token-table
	260	** Skeleton strategy
	261	Must we keep %token-table?
	262
	263	* Precedence
	264
	265	** Partial order
	266	It is unfortunate that there is a total order for precedence. It
	267	makes it impossible to have modular precedence information. We should
	268	move to partial orders (sounds like series/parallel orders to me).
	269
	270	** RR conflicts
	271	See if we can use precedence between rules to solve RR conflicts. See
	272	what POSIX says.
	273
	274
	275	* $undefined
	276	From Hans:
	277	- If the Bison generated parser experiences an undefined number in the
	278	character range, that character is written out in diagnostic messages, an
	279	addition to the $undefined value.
	280
	281	Suggest: Change the name $undefined to undefined; looks better in outputs.
	282
	283
	284	* Default Action
	285	From Hans:
	286	- For use with my C++ parser, I transported the "switch (yyn)" statement
	287	that Bison writes to the bison.simple skeleton file. This way, I can remove
	288	the current default rule $$ = $1 implementation, which causes a double
	289	assignment to $$ which may not be OK under C++, replacing it with a
	290	"default:" part within the switch statement.
	291
	292	Note that the default rule $$ = $1, when typed, is perfectly OK under C,
	293	but in the C++ implementation I made, this rule is different from
	294	$<type_name>$ = $<type_name>1. I therefore think that one should implement
	295	a Bison option where every typed default rule is explicitly written out
	296	(same typed ruled can of course be grouped together).
	297
	298	* Pre and post actions.
	299	From: Florian Krohm <florian@edamail.fishkill.ibm.com>
	300	Subject: YYACT_EPILOGUE
	301	To: bug-bison@gnu.org
	302	X-Sent: 1 week, 4 days, 14 hours, 38 minutes, 11 seconds ago
	303
	304	The other day I had the need for explicitly building the parse tree. I
	305	used %locations for that and defined YYLLOC_DEFAULT to call a function
	306	that returns the tree node for the production. Easy. But I also needed
	307	to assign the S-attribute to the tree node. That cannot be done in
	308	YYLLOC_DEFAULT, because it is invoked before the action is executed.
	309	The way I solved this was to define a macro YYACT_EPILOGUE that would
	310	be invoked after the action. For reasons of symmetry I also added
	311	YYACT_PROLOGUE. Although I had no use for that I can envision how it
	312	might come in handy for debugging purposes.
	313	All is needed is to add
	314
	315	#if YYLSP_NEEDED
	316	YYACT_EPILOGUE (yyval, (yyvsp - yylen), yylen, yyloc, (yylsp - yylen));
	317	#else
	318	YYACT_EPILOGUE (yyval, (yyvsp - yylen), yylen);
	319	#endif
	320
	321	at the proper place to bison.simple. Ditto for YYACT_PROLOGUE.
	322
	323	I was wondering what you think about adding YYACT_PROLOGUE/EPILOGUE
	324	to bison. If you're interested, I'll work on a patch.
	325
	326	* Better graphics
	327	Equip the parser with a means to create the (visual) parse tree.
	328
	329	* Complaint submessage indentation.
	330	We already have an implementation that works fairly well for named
	331	reference messages, but it would be nice to use it consistently for all
	332	submessages from Bison. For example, the "previous definition"
	333	submessage or the list of correct values for a %define variable might
	334	look better with indentation.
	335
	336	However, the current implementation makes the assumption that the
	337	location printed on the first line is not usually much shorter than the
	338	locations printed on the submessage lines that follow. That assumption
	339	may not hold true as often for some kinds of submessages especially if
	340	we ever support multiple grammar files.
	341
	342	Here's a proposal for how a new implementation might look:
	343
	344	http://lists.gnu.org/archive/html/bison-patches/2009-09/msg00086.html
	345
	346
	347	Local Variables:
	348	mode: outline
	349	coding: utf-8
	350	End:
	351
	352	-----
	353
	354	Copyright (C) 2001-2004, 2006, 2008-2012 Free Software Foundation, Inc.
	355
	356	This file is part of Bison, the GNU Compiler Compiler.
	357
	358	This program is free software: you can redistribute it and/or modify
	359	it under the terms of the GNU General Public License as published by
	360	the Free Software Foundation, either version 3 of the License, or
	361	(at your option) any later version.
	362
	363	This program is distributed in the hope that it will be useful,
	364	but WITHOUT ANY WARRANTY; without even the implied warranty of
	365	MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
	366	GNU General Public License for more details.
	367
	368	You should have received a copy of the GNU General Public License
	369	along with this program. If not, see <http://www.gnu.org/licenses/>.