]> git.saurik.com Git - bison.git/blame - TODO
glr.cc: no longer require location support
[bison.git] / TODO
CommitLineData
ff1b7a13 1* Short term
4323e0da
AD
2** scan-code.l
3Avoid variables for format strings, as then GCC cannot check them.
4show_sub_messages should call show_sub_message.
5
ff1b7a13
AD
6** Variable names.
7What should we name `variant' and `lex_symbol'?
8
ff1b7a13
AD
9** Get rid of fake #lines [Bison: ...]
10Possibly as simple as checking whether the column number is nonnegative.
11
12I have seen messages like the following from GCC.
13
14<built-in>:0: fatal error: opening dependency file .deps/libltdl/argz.Tpo: No such file or directory
15
16
17** Discuss about %printer/%destroy in the case of C++.
18It would be very nice to provide the symbol classes with an operator<<
19and a destructor. Unfortunately the syntax we have chosen for
20%destroy and %printer make them hard to reuse. For instance, the user
21is invited to write something like
22
23 %printer { debug_stream() << $$; } <my_type>;
24
25which is hard to reuse elsewhere since it wants to use
26"debug_stream()" to find the stream to use. The same applies to
27%destroy: we told the user she could use the members of the Parser
28class in the printers/destructors, which is not good for an operator<<
29since it is no longer bound to a particular parser, it's just a
30(standalone symbol).
31
32** Rename LR0.cc
33as lr0.cc, why upper case?
34
35** bench several bisons.
36Enhance bench.pl with %b to run different bisons.
37
38* Various
ff1b7a13
AD
39** YYERRCODE
40Defined to 256, but not used, not documented. Probably the token
41number for the error token, which POSIX wants to be 256, but which
42Bison might renumber if the user used number 256. Keep fix and doc?
43Throw away?
44
45Also, why don't we output the token name of the error token in the
46output? It is explicitly skipped:
47
48 /* Skip error token and tokens without identifier. */
49 if (sym != errtoken && id)
50
51Of course there are issues with name spaces, but if we disable we have
52something which seems to be more simpler and more consistent instead
53of the special case YYERRCODE.
54
55 enum yytokentype {
56 error = 256,
57 // ...
58 };
59
60
61We could (should?) also treat the case of the undef_token, which is
62numbered 257 for yylex, and 2 internal. Both appear for instance in
63toknum:
64
65 const unsigned short int
66 parser::yytoken_number_[] =
67 {
68 0, 256, 257, 258, 259, 260, 261, 262, 263, 264,
69
70while here
71
72 enum yytokentype {
73 TOK_EOF = 0,
74 TOK_EQ = 258,
75
76so both 256 and 257 are "mysterious".
77
78 const char*
79 const parser::yytname_[] =
80 {
81 "\"end of command\"", "error", "$undefined", "\"=\"", "\"break\"",
82
83
ff1b7a13
AD
84** yychar == yyempty_
85The code in yyerrlab reads:
86
87 if (yychar <= YYEOF)
88 {
89 /* Return failure if at end of input. */
90 if (yychar == YYEOF)
91 YYABORT;
92 }
93
94There are only two yychar that can be <= YYEOF: YYEMPTY and YYEOF.
95But I can't produce the situation where yychar is YYEMPTY here, is it
96really possible? The test suite does not exercise this case.
97
98This shows that it would be interesting to manage to install skeleton
99coverage analysis to the test suite.
100
101** Table definitions
102It should be very easy to factor the definition of the various tables,
103including the separation bw declaration and definition. See for
104instance b4_table_define in lalr1.cc. This way, we could even factor
105C vs. C++ definitions.
106
107* From lalr1.cc to yacc.c
108** Single stack
109Merging the three stacks in lalr1.cc simplified the code, prompted for
110other improvements and also made it faster (probably because memory
111management is performed once instead of three times). I suggest that
112we do the same in yacc.c.
113
114** yysyntax_error
115The code bw glr.c and yacc.c is really alike, we can certainly factor
116some parts.
416bd7a9 117
3c146b5e 118
2ab9a04f 119* Report
ec3bc396 120
ff1b7a13
AD
121** Figures
122Some statistics about the grammar and the parser would be useful,
123especially when asking the user to send some information about the
124grammars she is working on. We should probably also include some
125information about the variables (I'm not sure for instance we even
126specify what LR variant was used).
127
2ab9a04f
AD
128** GLR
129How would Paul like to display the conflicted actions? In particular,
742e4900 130what when two reductions are possible on a given lookahead token, but one is
2ab9a04f
AD
131part of $default. Should we make the two reductions explicit, or just
132keep $default? See the following point.
d7215705 133
2ab9a04f
AD
134** Disabled Reductions
135See `tests/conflicts.at (Defaulted Conflicted Reduction)', and decide
136what we want to do.
d7215705 137
2ab9a04f 138** Documentation
bc933ef1
AD
139Extend with error productions. The hard part will probably be finding
140the right rule so that a single state does not exhibit too many yet
141undocumented ``features''. Maybe an empty action ought to be
142presented too. Shall we try to make a single grammar with all these
143features, or should we have several very small grammars?
ec3bc396 144
2ab9a04f
AD
145** --report=conflict-path
146Provide better assistance for understanding the conflicts by providing
147a sample text exhibiting the (LALR) ambiguity. See the paper from
148DeRemer and Penello: they already provide the algorithm.
149
38eb7751
PE
150** Statically check for potential ambiguities in GLR grammars. See
151<http://www.i3s.unice.fr/~schmitz/papers.html#expamb> for an approach.
152
ec3bc396 153
948be909 154* Extensions
2ab9a04f 155
959e5f51
AD
156** $-1
157We should find a means to provide an access to values deep in the
158stack. For instance, instead of
159
ff1b7a13 160 baz: qux { $$ = $<foo>-1 + $<bar>0 + $1; }
959e5f51
AD
161
162we should be able to have:
163
164 foo($foo) bar($bar) baz($bar): qux($qux) { $baz = $foo + $bar + $qux; }
165
166Or something like this.
167
f0e48240
AD
168** %if and the like
169It should be possible to have %if/%else/%endif. The implementation is
170not clear: should it be lexical or syntactic. Vadim Maslow thinks it
171must be in the scanner: we must not parse what is in a switched off
172part of %if. Akim Demaille thinks it should be in the parser, so as
173to avoid falling into another CPP mistake.
174
ca752c34
AD
175** XML Output
176There are couple of available extensions of Bison targeting some XML
177output. Some day we should consider including them. One issue is
178that they seem to be quite orthogonal to the parsing technique, and
179seem to depend mostly on the possibility to have some code triggered
180for each reduction. As a matter of fact, such hooks could also be
181used to generate the yydebug traces. Some generic scheme probably
182exists in there.
183
184XML output for GNU Bison and gcc
185 http://www.cs.may.ie/~jpower/Research/bisonXML/
186
187XML output for GNU Bison
188 http://yaxx.sourceforge.net/
f0e48240 189
fa770c86
AD
190* Unit rules
191Maybe we could expand unit rules, i.e., transform
192
ff1b7a13
AD
193 exp: arith | bool;
194 arith: exp '+' exp;
195 bool: exp '&' exp;
fa770c86
AD
196
197into
198
ff1b7a13 199 exp: exp '+' exp | exp '&' exp;
fa770c86
AD
200
201when there are no actions. This can significantly speed up some
d7215705
AD
202grammars. I can't find the papers. In particular the book `LR
203parsing: Theory and Practice' is impossible to find, but according to
204`Parsing Techniques: a Practical Guide', it includes information about
205this issue. Does anybody have it?
fa770c86 206
51dec47b 207
51dec47b 208
2ab9a04f 209* Documentation
51dec47b 210
2ab9a04f
AD
211** History/Bibliography
212Some history of Bison and some bibliography would be most welcome.
213Are there any Texinfo standards for bibliography?
214
2ab9a04f
AD
215* Coding system independence
216Paul notes:
217
ff1b7a13
AD
218 Currently Bison assumes 8-bit bytes (i.e. that UCHAR_MAX is
219 255). It also assumes that the 8-bit character encoding is
220 the same for the invocation of 'bison' as it is for the
221 invocation of 'cc', but this is not necessarily true when
222 people run bison on an ASCII host and then use cc on an EBCDIC
223 host. I don't think these topics are worth our time
224 addressing (unless we find a gung-ho volunteer for EBCDIC or
225 PDP-10 ports :-) but they should probably be documented
226 somewhere.
fa770c86 227
ff1b7a13
AD
228 More importantly, Bison does not currently allow NUL bytes in
229 tokens, either via escapes (e.g., "x\0y") or via a NUL byte in
230 the source code. This should get fixed.
aef1ffd5 231
bcb05e75 232* --graph
45567173 233Show reductions.
bcb05e75 234
704a47c4 235* Broken options ?
45567173
AD
236** %token-table
237** Skeleton strategy
728c4be2 238Must we keep %token-table?
416bd7a9 239
0e95c1dd 240* Precedence
2ab9a04f
AD
241
242** Partial order
0e95c1dd
AD
243It is unfortunate that there is a total order for precedence. It
244makes it impossible to have modular precedence information. We should
2ab9a04f 245move to partial orders (sounds like series/parallel orders to me).
0e95c1dd 246
2ab9a04f
AD
247** RR conflicts
248See if we can use precedence between rules to solve RR conflicts. See
249what POSIX says.
250
251
69991a58
AD
252* $undefined
253From Hans:
254- If the Bison generated parser experiences an undefined number in the
255character range, that character is written out in diagnostic messages, an
256addition to the $undefined value.
257
258Suggest: Change the name $undefined to undefined; looks better in outputs.
259
2ab9a04f 260
69991a58
AD
261* Default Action
262From Hans:
263- For use with my C++ parser, I transported the "switch (yyn)" statement
264that Bison writes to the bison.simple skeleton file. This way, I can remove
265the current default rule $$ = $1 implementation, which causes a double
266assignment to $$ which may not be OK under C++, replacing it with a
267"default:" part within the switch statement.
268
269Note that the default rule $$ = $1, when typed, is perfectly OK under C,
270but in the C++ implementation I made, this rule is different from
271$<type_name>$ = $<type_name>1. I therefore think that one should implement
272a Bison option where every typed default rule is explicitly written out
273(same typed ruled can of course be grouped together).
274
275* Pre and post actions.
276From: Florian Krohm <florian@edamail.fishkill.ibm.com>
277Subject: YYACT_EPILOGUE
278To: bug-bison@gnu.org
279X-Sent: 1 week, 4 days, 14 hours, 38 minutes, 11 seconds ago
280
281The other day I had the need for explicitly building the parse tree. I
282used %locations for that and defined YYLLOC_DEFAULT to call a function
283that returns the tree node for the production. Easy. But I also needed
284to assign the S-attribute to the tree node. That cannot be done in
285YYLLOC_DEFAULT, because it is invoked before the action is executed.
286The way I solved this was to define a macro YYACT_EPILOGUE that would
287be invoked after the action. For reasons of symmetry I also added
288YYACT_PROLOGUE. Although I had no use for that I can envision how it
289might come in handy for debugging purposes.
76551463 290All is needed is to add
69991a58
AD
291
292#if YYLSP_NEEDED
293 YYACT_EPILOGUE (yyval, (yyvsp - yylen), yylen, yyloc, (yylsp - yylen));
294#else
295 YYACT_EPILOGUE (yyval, (yyvsp - yylen), yylen);
296#endif
297
298at the proper place to bison.simple. Ditto for YYACT_PROLOGUE.
299
300I was wondering what you think about adding YYACT_PROLOGUE/EPILOGUE
301to bison. If you're interested, I'll work on a patch.
302
35fe0834
PE
303* Better graphics
304Equip the parser with a means to create the (visual) parse tree.
d7215705 305
ff1b7a13
AD
306* Complaint submessage indentation.
307We already have an implementation that works fairly well for named
308reference messages, but it would be nice to use it consistently for all
309submessages from Bison. For example, the "previous definition"
310submessage or the list of correct values for a %define variable might
311look better with indentation.
312
313However, the current implementation makes the assumption that the
314location printed on the first line is not usually much shorter than the
315locations printed on the submessage lines that follow. That assumption
316may not hold true as often for some kinds of submessages especially if
317we ever support multiple grammar files.
318
319Here's a proposal for how a new implementation might look:
320
321 http://lists.gnu.org/archive/html/bison-patches/2009-09/msg00086.html
322
323
324Local Variables:
325mode: outline
326coding: utf-8
327End:
328
f294a2c2
AD
329-----
330
c932d613 331Copyright (C) 2001-2004, 2006, 2008-2012 Free Software Foundation, Inc.
f294a2c2 332
51cbef6f 333This file is part of Bison, the GNU Compiler Compiler.
f294a2c2 334
f16b0819 335This program is free software: you can redistribute it and/or modify
f294a2c2 336it under the terms of the GNU General Public License as published by
f16b0819
PE
337the Free Software Foundation, either version 3 of the License, or
338(at your option) any later version.
f294a2c2 339
f16b0819 340This program is distributed in the hope that it will be useful,
f294a2c2
AD
341but WITHOUT ANY WARRANTY; without even the implied warranty of
342MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
343GNU General Public License for more details.
344
345You should have received a copy of the GNU General Public License
f16b0819 346along with this program. If not, see <http://www.gnu.org/licenses/>.