git.saurik.com Git - bison.git/blame_incremental

... / ...

Commit	Line	Data
	1	* Short term
	2	** Graphviz display code thoughts
	3	The code for the --graph option is over two files: print_graph, and
	4	graphviz. I believe this is because Bison used to also produce VCG graphs,
	5	but since this is no longer true, maybe we could consider these files for
	6	fusion.
	7
	8	Little effort factoring seems to have been given to factoring in these files,
	9	and their print-xml and print counterpart. We would very much like to re-use
	10	the pretty format of states from .output in the .dot
	11
	12	Also, the underscore in print_graph.[ch] isn't very fitting considering
	13	the dashes in the other filenames.
	14
	15	** push-parser
	16	Check it too when checking the different kinds of parsers. And be
	17	sure to check that the initial-action is performed once per parsing.
	18
	19	** m4 names
	20	b4_shared_declarations is no longer what it is. Make it
	21	b4_parser_declaration for instance.
	22
	23	** yychar in lalr1.cc
	24	There is a large difference bw maint and master on the handling of
	25	yychar (which was removed in lalr1.cc). See what needs to be
	26	back-ported.
	27
	28
	29	/* User semantic actions sometimes alter yychar, and that requires
	30	that yytoken be updated with the new translation. We take the
	31	approach of translating immediately before every use of yytoken.
	32	One alternative is translating here after every semantic action,
	33	but that translation would be missed if the semantic action
	34	invokes YYABORT, YYACCEPT, or YYERROR immediately after altering
	35	yychar. In the case of YYABORT or YYACCEPT, an incorrect
	36	destructor might then be invoked immediately. In the case of
	37	YYERROR, subsequent parser actions might lead to an incorrect
	38	destructor call or verbose syntax error message before the
	39	lookahead is translated. */
	40
	41	/* Make sure we have latest lookahead translation. See comments at
	42	user semantic actions for why this is necessary. */
	43	yytoken = yytranslate_ (yychar);
	44
	45
	46	** stack.hh
	47	Get rid of it. The original idea is nice, but actually it makes
	48	the code harder to follow, and uselessly different from the other
	49	skeletons.
	50
	51	** Get rid of fake #lines [Bison: ...]
	52	Possibly as simple as checking whether the column number is nonnegative.
	53
	54	I have seen messages like the following from GCC.
	55
	56	<built-in>:0: fatal error: opening dependency file .deps/libltdl/argz.Tpo: No such file or directory
	57
	58
	59	** Discuss about %printer/%destroy in the case of C++.
	60	It would be very nice to provide the symbol classes with an operator<<
	61	and a destructor. Unfortunately the syntax we have chosen for
	62	%destroy and %printer make them hard to reuse. For instance, the user
	63	is invited to write something like
	64
	65	%printer { debug_stream() << $$; } <my_type>;
	66
	67	which is hard to reuse elsewhere since it wants to use
	68	"debug_stream()" to find the stream to use. The same applies to
	69	%destroy: we told the user she could use the members of the Parser
	70	class in the printers/destructors, which is not good for an operator<<
	71	since it is no longer bound to a particular parser, it's just a
	72	(standalone symbol).
	73
	74	** Rename LR0.cc
	75	as lr0.cc, why upper case?
	76
	77	* Various
	78	** YYERRCODE
	79	Defined to 256, but not used, not documented. Probably the token
	80	number for the error token, which POSIX wants to be 256, but which
	81	Bison might renumber if the user used number 256. Keep fix and doc?
	82	Throw away?
	83
	84	Also, why don't we output the token name of the error token in the
	85	output? It is explicitly skipped:
	86
	87	/* Skip error token and tokens without identifier. */
	88	if (sym != errtoken && id)
	89
	90	Of course there are issues with name spaces, but if we disable we have
	91	something which seems to be more simpler and more consistent instead
	92	of the special case YYERRCODE.
	93
	94	enum yytokentype {
	95	error = 256,
	96	// ...
	97	};
	98
	99
	100	We could (should?) also treat the case of the undef_token, which is
	101	numbered 257 for yylex, and 2 internal. Both appear for instance in
	102	toknum:
	103
	104	const unsigned short int
	105	parser::yytoken_number_[] =
	106	{
	107	0, 256, 257, 258, 259, 260, 261, 262, 263, 264,
	108
	109	while here
	110
	111	enum yytokentype {
	112	TOK_EOF = 0,
	113	TOK_EQ = 258,
	114
	115	so both 256 and 257 are "mysterious".
	116
	117	const char*
	118	const parser::yytname_[] =
	119	{
	120	"\"end of command\"", "error", "$undefined", "\"=\"", "\"break\"",
	121
	122
	123	** yychar == yyempty_
	124	The code in yyerrlab reads:
	125
	126	if (yychar <= YYEOF)
	127	{
	128	/* Return failure if at end of input. */
	129	if (yychar == YYEOF)
	130	YYABORT;
	131	}
	132
	133	There are only two yychar that can be <= YYEOF: YYEMPTY and YYEOF.
	134	But I can't produce the situation where yychar is YYEMPTY here, is it
	135	really possible? The test suite does not exercise this case.
	136
	137	This shows that it would be interesting to manage to install skeleton
	138	coverage analysis to the test suite.
	139
	140	* From lalr1.cc to yacc.c
	141	** Single stack
	142	Merging the three stacks in lalr1.cc simplified the code, prompted for
	143	other improvements and also made it faster (probably because memory
	144	management is performed once instead of three times). I suggest that
	145	we do the same in yacc.c.
	146
	147	** yysyntax_error
	148	The code bw glr.c and yacc.c is really alike, we can certainly factor
	149	some parts.
	150
	151
	152	* Report
	153
	154	** Figures
	155	Some statistics about the grammar and the parser would be useful,
	156	especially when asking the user to send some information about the
	157	grammars she is working on. We should probably also include some
	158	information about the variables (I'm not sure for instance we even
	159	specify what LR variant was used).
	160
	161	** GLR
	162	How would Paul like to display the conflicted actions? In particular,
	163	what when two reductions are possible on a given lookahead token, but one is
	164	part of $default. Should we make the two reductions explicit, or just
	165	keep $default? See the following point.
	166
	167	** Disabled Reductions
	168	See `tests/conflicts.at (Defaulted Conflicted Reduction)', and decide
	169	what we want to do.
	170
	171	** Documentation
	172	Extend with error productions. The hard part will probably be finding
	173	the right rule so that a single state does not exhibit too many yet
	174	undocumented ``features''. Maybe an empty action ought to be
	175	presented too. Shall we try to make a single grammar with all these
	176	features, or should we have several very small grammars?
	177
	178	** --report=conflict-path
	179	Provide better assistance for understanding the conflicts by providing
	180	a sample text exhibiting the (LALR) ambiguity. See the paper from
	181	DeRemer and Penello: they already provide the algorithm.
	182
	183	** Statically check for potential ambiguities in GLR grammars. See
	184	<http://www.i3s.unice.fr/~schmitz/papers.html#expamb> for an approach.
	185
	186
	187	* Extensions
	188
	189	** $-1
	190	We should find a means to provide an access to values deep in the
	191	stack. For instance, instead of
	192
	193	baz: qux { $$ = $<foo>-1 + $<bar>0 + $1; }
	194
	195	we should be able to have:
	196
	197	foo($foo) bar($bar) baz($bar): qux($qux) { $baz = $foo + $bar + $qux; }
	198
	199	Or something like this.
	200
	201	** %if and the like
	202	It should be possible to have %if/%else/%endif. The implementation is
	203	not clear: should it be lexical or syntactic. Vadim Maslow thinks it
	204	must be in the scanner: we must not parse what is in a switched off
	205	part of %if. Akim Demaille thinks it should be in the parser, so as
	206	to avoid falling into another CPP mistake.
	207
	208	** XML Output
	209	There are couple of available extensions of Bison targeting some XML
	210	output. Some day we should consider including them. One issue is
	211	that they seem to be quite orthogonal to the parsing technique, and
	212	seem to depend mostly on the possibility to have some code triggered
	213	for each reduction. As a matter of fact, such hooks could also be
	214	used to generate the yydebug traces. Some generic scheme probably
	215	exists in there.
	216
	217	XML output for GNU Bison and gcc
	218	http://www.cs.may.ie/~jpower/Research/bisonXML/
	219
	220	XML output for GNU Bison
	221	http://yaxx.sourceforge.net/
	222
	223	* Unit rules
	224	Maybe we could expand unit rules, i.e., transform
	225
	226	exp: arith \| bool;
	227	arith: exp '+' exp;
	228	bool: exp '&' exp;
	229
	230	into
	231
	232	exp: exp '+' exp \| exp '&' exp;
	233
	234	when there are no actions. This can significantly speed up some
	235	grammars. I can't find the papers. In particular the book `LR
	236	parsing: Theory and Practice' is impossible to find, but according to
	237	`Parsing Techniques: a Practical Guide', it includes information about
	238	this issue. Does anybody have it?
	239
	240
	241
	242	* Documentation
	243
	244	** History/Bibliography
	245	Some history of Bison and some bibliography would be most welcome.
	246	Are there any Texinfo standards for bibliography?
	247
	248	* Coding system independence
	249	Paul notes:
	250
	251	Currently Bison assumes 8-bit bytes (i.e. that UCHAR_MAX is
	252	255). It also assumes that the 8-bit character encoding is
	253	the same for the invocation of 'bison' as it is for the
	254	invocation of 'cc', but this is not necessarily true when
	255	people run bison on an ASCII host and then use cc on an EBCDIC
	256	host. I don't think these topics are worth our time
	257	addressing (unless we find a gung-ho volunteer for EBCDIC or
	258	PDP-10 ports :-) but they should probably be documented
	259	somewhere.
	260
	261	More importantly, Bison does not currently allow NUL bytes in
	262	tokens, either via escapes (e.g., "x\0y") or via a NUL byte in
	263	the source code. This should get fixed.
	264
	265	* Broken options ?
	266	** %token-table
	267	** Skeleton strategy
	268	Must we keep %token-table?
	269
	270	* Precedence
	271
	272	** Partial order
	273	It is unfortunate that there is a total order for precedence. It
	274	makes it impossible to have modular precedence information. We should
	275	move to partial orders (sounds like series/parallel orders to me).
	276
	277	** RR conflicts
	278	See if we can use precedence between rules to solve RR conflicts. See
	279	what POSIX says.
	280
	281
	282	* $undefined
	283	From Hans:
	284	- If the Bison generated parser experiences an undefined number in the
	285	character range, that character is written out in diagnostic messages, an
	286	addition to the $undefined value.
	287
	288	Suggest: Change the name $undefined to undefined; looks better in outputs.
	289
	290
	291	* Default Action
	292	From Hans:
	293	- For use with my C++ parser, I transported the "switch (yyn)" statement
	294	that Bison writes to the bison.simple skeleton file. This way, I can remove
	295	the current default rule $$ = $1 implementation, which causes a double
	296	assignment to $$ which may not be OK under C++, replacing it with a
	297	"default:" part within the switch statement.
	298
	299	Note that the default rule $$ = $1, when typed, is perfectly OK under C,
	300	but in the C++ implementation I made, this rule is different from
	301	$<type_name>$ = $<type_name>1. I therefore think that one should implement
	302	a Bison option where every typed default rule is explicitly written out
	303	(same typed ruled can of course be grouped together).
	304
	305	* Pre and post actions.
	306	From: Florian Krohm <florian@edamail.fishkill.ibm.com>
	307	Subject: YYACT_EPILOGUE
	308	To: bug-bison@gnu.org
	309	X-Sent: 1 week, 4 days, 14 hours, 38 minutes, 11 seconds ago
	310
	311	The other day I had the need for explicitly building the parse tree. I
	312	used %locations for that and defined YYLLOC_DEFAULT to call a function
	313	that returns the tree node for the production. Easy. But I also needed
	314	to assign the S-attribute to the tree node. That cannot be done in
	315	YYLLOC_DEFAULT, because it is invoked before the action is executed.
	316	The way I solved this was to define a macro YYACT_EPILOGUE that would
	317	be invoked after the action. For reasons of symmetry I also added
	318	YYACT_PROLOGUE. Although I had no use for that I can envision how it
	319	might come in handy for debugging purposes.
	320	All is needed is to add
	321
	322	#if YYLSP_NEEDED
	323	YYACT_EPILOGUE (yyval, (yyvsp - yylen), yylen, yyloc, (yylsp - yylen));
	324	#else
	325	YYACT_EPILOGUE (yyval, (yyvsp - yylen), yylen);
	326	#endif
	327
	328	at the proper place to bison.simple. Ditto for YYACT_PROLOGUE.
	329
	330	I was wondering what you think about adding YYACT_PROLOGUE/EPILOGUE
	331	to bison. If you're interested, I'll work on a patch.
	332
	333	* Better graphics
	334	Equip the parser with a means to create the (visual) parse tree.
	335
	336	* Complaint submessage indentation.
	337	We already have an implementation that works fairly well for named
	338	reference messages, but it would be nice to use it consistently for all
	339	submessages from Bison. For example, the "previous definition"
	340	submessage or the list of correct values for a %define variable might
	341	look better with indentation.
	342
	343	However, the current implementation makes the assumption that the
	344	location printed on the first line is not usually much shorter than the
	345	locations printed on the submessage lines that follow. That assumption
	346	may not hold true as often for some kinds of submessages especially if
	347	we ever support multiple grammar files.
	348
	349	Here's a proposal for how a new implementation might look:
	350
	351	http://lists.gnu.org/archive/html/bison-patches/2009-09/msg00086.html
	352
	353
	354	Local Variables:
	355	mode: outline
	356	coding: utf-8
	357	End:
	358
	359	-----
	360
	361	Copyright (C) 2001-2004, 2006, 2008-2013 Free Software Foundation, Inc.
	362
	363	This file is part of Bison, the GNU Compiler Compiler.
	364
	365	This program is free software: you can redistribute it and/or modify
	366	it under the terms of the GNU General Public License as published by
	367	the Free Software Foundation, either version 3 of the License, or
	368	(at your option) any later version.
	369
	370	This program is distributed in the hope that it will be useful,
	371	but WITHOUT ANY WARRANTY; without even the implied warranty of
	372	MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
	373	GNU General Public License for more details.
	374
	375	You should have received a copy of the GNU General Public License
	376	along with this program. If not, see <http://www.gnu.org/licenses/>.