git.saurik.com Git - bison.git/blame_incremental

... / ...

Commit	Line	Data
	1	-- outline --
	2
	3	* Short term
	4	** Use b4_symbol in all the skeleton
	5	Then remove the older system, including the tables generated by
	6	output.c
	7
	8	** Update the documentation on gnu.org
	9
	10	** Get rid of fake #lines [Bison: ...]
	11	Possibly as simple as checking whether the column number is nonnegative.
	12
	13	I have seen messages like the following from GCC.
	14
	15	<built-in>:0: fatal error: opening dependency file .deps/libltdl/argz.Tpo: No such file or directory
	16
	17
	18	** Document %define assert
	19
	20	** Discuss about %printer/%destroy in the case of C++.
	21	It would be very nice to provide the symbol classes with an operator<<
	22	and a destructor. Unfortunately the syntax we have chosen for
	23	%destroy and %printer make them hard to reuse. For instance, the user
	24	is invited to write something like
	25
	26	%printer { debug_stream() << $$; } <my_type>;
	27
	28	which is hard to reuse elsewhere since it wants to use
	29	"debug_stream()" to find the stream to use. The same applies to
	30	%destroy: we told the user she could use the members of the Parser
	31	class in the printers/destructors, which is not good for an operator<<
	32	since it is no longer bound to a particular parser, it's just a
	33	(standalone symbol).
	34
	35	** Rename LR0.cc
	36	as lr0.cc, why upper case?
	37
	38	** bench several bisons.
	39	Enhance bench.pl with %b to run different bisons.
	40
	41	** Use b4_symbol everywhere.
	42	Move its definition in the more standard places and deploy it in other
	43	skeletons.
	44
	45	* Various
	46	** YYPRINT
	47	glr.c inherits its symbol_print function from c.m4, which supports
	48	YYPRINT. But to use YYPRINT yytoknum is needed, which not defined by
	49	glr.c.
	50
	51	Anyway, IMHO YYPRINT is obsolete and should be restricted to yacc.c.
	52
	53	** YYERRCODE
	54	Defined to 256, but not used, not documented. Probably the token
	55	number for the error token, which POSIX wants to be 256, but which
	56	Bison might renumber if the user used number 256. Keep fix and doc?
	57	Throw away?
	58
	59	We could (should?) also treat the case of the undef_token, which is
	60	numbered 257 for yylex, and 2 internal. Both appear for instance in
	61	toknum:
	62
	63	const unsigned short int
	64	parser::yytoken_number_[] =
	65	{
	66	0, 256, 257, 258, 259, 260, 261, 262, 263, 264,
	67
	68	while here
	69
	70	enum yytokentype {
	71	TOK_EOF = 0,
	72	TOK_EQ = 258,
	73
	74	so both 256 and 257 are "mysterious".
	75
	76	const char*
	77	const parser::yytname_[] =
	78	{
	79	"\"end of command\"", "error", "$undefined", "\"=\"", "\"break\"",
	80
	81
	82	** YYFAIL
	83	It is seems to be really obsolete now, shall we remove it?
	84
	85	** YYBACKUP
	86	There is no test about it, no examples in the doc, and I'm not sure
	87	what it should look like. For instance what follows crashes.
	88
	89	%error-verbose
	90	%debug
	91	%pure-parser
	92	%code {
	93	# include <stdio.h>
	94	# include <stdlib.h>
	95	# include <assert.h>
	96
	97	static void yyerror (const char *msg);
	98	static int yylex (YYSTYPE *yylval);
	99	}
	100	%%
	101	exp:
	102	'a' { printf ("a: %d\n", $1); }
	103	\| 'b' { YYBACKUP('a', 123); }
	104	;
	105	%%
	106	static int
	107	yylex (YYSTYPE *yylval)
	108	{
	109	static char const input[] = "b";
	110	static size_t toknum;
	111	assert (toknum < sizeof input);
	112	yylval = (toknum + 1) 10;
	113	return input[toknum++];
	114	}
	115
	116	static void
	117	yyerror (const char *msg)
	118	{
	119	fprintf (stderr, "%s\n", msg);
	120	}
	121
	122	int
	123	main (void)
	124	{
	125	yydebug = !!getenv("YYDEBUG");
	126	return yyparse ();
	127	}
	128
	129	** yychar == yyempty_
	130	The code in yyerrlab reads:
	131
	132	if (yychar <= YYEOF)
	133	{
	134	/* Return failure if at end of input. */
	135	if (yychar == YYEOF)
	136	YYABORT;
	137	}
	138
	139	There are only two yychar that can be <= YYEOF: YYEMPTY and YYEOF.
	140	But I can't produce the situation where yychar is YYEMPTY here, is it
	141	really possible? The test suite does not exercise this case.
	142
	143	This shows that it would be interesting to manage to install skeleton
	144	coverage analysis to the test suite.
	145
	146	** Table definitions
	147	It should be very easy to factor the definition of the various tables,
	148	including the separation bw declaration and definition. See for
	149	instance b4_table_define in lalr1.cc. This way, we could even factor
	150	C vs. C++ definitions.
	151
	152	* From lalr1.cc to yacc.c
	153	** Single stack
	154	Merging the three stacks in lalr1.cc simplified the code, prompted for
	155	other improvements and also made it faster (probably because memory
	156	management is performed once instead of three times). I suggest that
	157	we do the same in yacc.c.
	158
	159	** yysyntax_error
	160	In lalr1.cc we invoke it with the translated lookahead (yytoken), and
	161	yacc.c uses yychar. I don't see why.
	162
	163	** yysyntax_error
	164	The use of switch to select yyfmt in lalr1.cc seems simpler than
	165	what's done in yacc.c.
	166
	167	* Header guards
	168
	169	From Franc,ois: should we keep the directory part in the CPP guard?
	170
	171
	172	* Yacc.c: CPP Macros
	173
	174	Do some people use YYPURE, YYLSP_NEEDED like we do in the test suite?
	175	They should not: it is not documented. But if they need to, let's
	176	find something clean (not like YYLSP_NEEDED...).
	177
	178
	179	* Installation
	180
	181	* Documentation
	182	Before releasing, make sure the documentation ("Understanding your
	183	parser") refers to the current `output' format.
	184
	185	* Report
	186
	187	** GLR
	188	How would Paul like to display the conflicted actions? In particular,
	189	what when two reductions are possible on a given lookahead token, but one is
	190	part of $default. Should we make the two reductions explicit, or just
	191	keep $default? See the following point.
	192
	193	** Disabled Reductions
	194	See `tests/conflicts.at (Defaulted Conflicted Reduction)', and decide
	195	what we want to do.
	196
	197	** Documentation
	198	Extend with error productions. The hard part will probably be finding
	199	the right rule so that a single state does not exhibit too many yet
	200	undocumented ``features''. Maybe an empty action ought to be
	201	presented too. Shall we try to make a single grammar with all these
	202	features, or should we have several very small grammars?
	203
	204	** --report=conflict-path
	205	Provide better assistance for understanding the conflicts by providing
	206	a sample text exhibiting the (LALR) ambiguity. See the paper from
	207	DeRemer and Penello: they already provide the algorithm.
	208
	209	** Statically check for potential ambiguities in GLR grammars. See
	210	<http://www.i3s.unice.fr/~schmitz/papers.html#expamb> for an approach.
	211
	212
	213	* Extensions
	214
	215	** Labeling the symbols
	216	Have a look at the Lemon parser generator: instead of $1, $2 etc. they
	217	can name the values. This is much more pleasant. For instance:
	218
	219	exp (res): exp (a) '+' exp (b) { $res = $a + $b; };
	220
	221	I love this. I have been bitten too often by the removal of the
	222	symbol, and forgetting to shift all the $n to $n-1. If you are
	223	unlucky, it compiles...
	224
	225	But instead of using $a etc., we can use regular variables. And
	226	instead of using (), I propose to use `:' (again). Paul suggests
	227	supporting `->' in addition to `:' to separate LHS and RHS. In other
	228	words:
	229
	230	r:exp -> a:exp '+' b:exp { r = a + b; };
	231
	232	That requires an significant improvement of the grammar parser. Using
	233	GLR would be nice. It also requires that Bison know the type of the
	234	symbols (which will be useful for %include anyway). So we have some
	235	time before...
	236
	237	Note that there remains the problem of locations: `@r'?
	238
	239
	240	** $-1
	241	We should find a means to provide an access to values deep in the
	242	stack. For instance, instead of
	243
	244	baz: qux { $$ = $<foo>-1 + $<bar>0 + $1; }
	245
	246	we should be able to have:
	247
	248	foo($foo) bar($bar) baz($bar): qux($qux) { $baz = $foo + $bar + $qux; }
	249
	250	Or something like this.
	251
	252	** %if and the like
	253	It should be possible to have %if/%else/%endif. The implementation is
	254	not clear: should it be lexical or syntactic. Vadim Maslow thinks it
	255	must be in the scanner: we must not parse what is in a switched off
	256	part of %if. Akim Demaille thinks it should be in the parser, so as
	257	to avoid falling into another CPP mistake.
	258
	259	** XML Output
	260	There are couple of available extensions of Bison targeting some XML
	261	output. Some day we should consider including them. One issue is
	262	that they seem to be quite orthogonal to the parsing technique, and
	263	seem to depend mostly on the possibility to have some code triggered
	264	for each reduction. As a matter of fact, such hooks could also be
	265	used to generate the yydebug traces. Some generic scheme probably
	266	exists in there.
	267
	268	XML output for GNU Bison and gcc
	269	http://www.cs.may.ie/~jpower/Research/bisonXML/
	270
	271	XML output for GNU Bison
	272	http://yaxx.sourceforge.net/
	273
	274	* Unit rules
	275	Maybe we could expand unit rules, i.e., transform
	276
	277	exp: arith \| bool;
	278	arith: exp '+' exp;
	279	bool: exp '&' exp;
	280
	281	into
	282
	283	exp: exp '+' exp \| exp '&' exp;
	284
	285	when there are no actions. This can significantly speed up some
	286	grammars. I can't find the papers. In particular the book `LR
	287	parsing: Theory and Practice' is impossible to find, but according to
	288	`Parsing Techniques: a Practical Guide', it includes information about
	289	this issue. Does anybody have it?
	290
	291
	292
	293	* Documentation
	294
	295	** History/Bibliography
	296	Some history of Bison and some bibliography would be most welcome.
	297	Are there any Texinfo standards for bibliography?
	298
	299
	300
	301	* Java, Fortran, etc.
	302
	303
	304	* Coding system independence
	305	Paul notes:
	306
	307	Currently Bison assumes 8-bit bytes (i.e. that UCHAR_MAX is
	308	255). It also assumes that the 8-bit character encoding is
	309	the same for the invocation of 'bison' as it is for the
	310	invocation of 'cc', but this is not necessarily true when
	311	people run bison on an ASCII host and then use cc on an EBCDIC
	312	host. I don't think these topics are worth our time
	313	addressing (unless we find a gung-ho volunteer for EBCDIC or
	314	PDP-10 ports :-) but they should probably be documented
	315	somewhere.
	316
	317	More importantly, Bison does not currently allow NUL bytes in
	318	tokens, either via escapes (e.g., "x\0y") or via a NUL byte in
	319	the source code. This should get fixed.
	320
	321	* --graph
	322	Show reductions.
	323
	324	* Broken options ?
	325	** %token-table
	326	** Skeleton strategy
	327	Must we keep %token-table?
	328
	329	* BTYacc
	330	See if we can integrate backtracking in Bison. Charles-Henri de
	331	Boysson <de-boy_c@epita.fr> has been working on this, but never gave
	332	the results.
	333
	334	Vadim Maslow, the maintainer of BTYacc was once contacted. Adjusting
	335	the Bison grammar parser will be needed to support some extra BTYacc
	336	features. This is less urgent.
	337
	338	** Keeping the conflicted actions
	339	First, analyze the differences between byacc and btyacc (I'm referring
	340	to the executables). Find where the conflicts are preserved.
	341
	342	** Compare with the GLR tables
	343	See how isomorphic the way BTYacc and the way the GLR adjustments in
	344	Bison are compatible. As much as possible one should try to use the
	345	same implementation in the Bison executables. I insist: it should be
	346	very feasible to use the very same conflict tables.
	347
	348	** Adjust the skeletons
	349	Import the skeletons for C and C++.
	350
	351
	352	* Precedence
	353
	354	** Partial order
	355	It is unfortunate that there is a total order for precedence. It
	356	makes it impossible to have modular precedence information. We should
	357	move to partial orders (sounds like series/parallel orders to me).
	358
	359	** RR conflicts
	360	See if we can use precedence between rules to solve RR conflicts. See
	361	what POSIX says.
	362
	363
	364	* $undefined
	365	From Hans:
	366	- If the Bison generated parser experiences an undefined number in the
	367	character range, that character is written out in diagnostic messages, an
	368	addition to the $undefined value.
	369
	370	Suggest: Change the name $undefined to undefined; looks better in outputs.
	371
	372
	373	* Default Action
	374	From Hans:
	375	- For use with my C++ parser, I transported the "switch (yyn)" statement
	376	that Bison writes to the bison.simple skeleton file. This way, I can remove
	377	the current default rule $$ = $1 implementation, which causes a double
	378	assignment to $$ which may not be OK under C++, replacing it with a
	379	"default:" part within the switch statement.
	380
	381	Note that the default rule $$ = $1, when typed, is perfectly OK under C,
	382	but in the C++ implementation I made, this rule is different from
	383	$<type_name>$ = $<type_name>1. I therefore think that one should implement
	384	a Bison option where every typed default rule is explicitly written out
	385	(same typed ruled can of course be grouped together).
	386
	387	* Pre and post actions.
	388	From: Florian Krohm <florian@edamail.fishkill.ibm.com>
	389	Subject: YYACT_EPILOGUE
	390	To: bug-bison@gnu.org
	391	X-Sent: 1 week, 4 days, 14 hours, 38 minutes, 11 seconds ago
	392
	393	The other day I had the need for explicitly building the parse tree. I
	394	used %locations for that and defined YYLLOC_DEFAULT to call a function
	395	that returns the tree node for the production. Easy. But I also needed
	396	to assign the S-attribute to the tree node. That cannot be done in
	397	YYLLOC_DEFAULT, because it is invoked before the action is executed.
	398	The way I solved this was to define a macro YYACT_EPILOGUE that would
	399	be invoked after the action. For reasons of symmetry I also added
	400	YYACT_PROLOGUE. Although I had no use for that I can envision how it
	401	might come in handy for debugging purposes.
	402	All is needed is to add
	403
	404	#if YYLSP_NEEDED
	405	YYACT_EPILOGUE (yyval, (yyvsp - yylen), yylen, yyloc, (yylsp - yylen));
	406	#else
	407	YYACT_EPILOGUE (yyval, (yyvsp - yylen), yylen);
	408	#endif
	409
	410	at the proper place to bison.simple. Ditto for YYACT_PROLOGUE.
	411
	412	I was wondering what you think about adding YYACT_PROLOGUE/EPILOGUE
	413	to bison. If you're interested, I'll work on a patch.
	414
	415	* Better graphics
	416	Equip the parser with a means to create the (visual) parse tree.
	417
	418	-----
	419
	420	Copyright (C) 2001, 2002, 2003, 2004, 2006, 2008 Free Software Foundation,
	421	Inc.
	422
	423	This file is part of Bison, the GNU Compiler Compiler.
	424
	425	This program is free software: you can redistribute it and/or modify
	426	it under the terms of the GNU General Public License as published by
	427	the Free Software Foundation, either version 3 of the License, or
	428	(at your option) any later version.
	429
	430	This program is distributed in the hope that it will be useful,
	431	but WITHOUT ANY WARRANTY; without even the implied warranty of
	432	MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
	433	GNU General Public License for more details.
	434
	435	You should have received a copy of the GNU General Public License
	436	along with this program. If not, see <http://www.gnu.org/licenses/>.