]> git.saurik.com Git - bison.git/blame - TODO
todo: update.
[bison.git] / TODO
CommitLineData
ff1b7a13
AD
1* Short term
2** Variable names.
3What should we name `variant' and `lex_symbol'?
4
ff1b7a13
AD
5** Get rid of fake #lines [Bison: ...]
6Possibly as simple as checking whether the column number is nonnegative.
7
8I have seen messages like the following from GCC.
9
10<built-in>:0: fatal error: opening dependency file .deps/libltdl/argz.Tpo: No such file or directory
11
12
13** Discuss about %printer/%destroy in the case of C++.
14It would be very nice to provide the symbol classes with an operator<<
15and a destructor. Unfortunately the syntax we have chosen for
16%destroy and %printer make them hard to reuse. For instance, the user
17is invited to write something like
18
19 %printer { debug_stream() << $$; } <my_type>;
20
21which is hard to reuse elsewhere since it wants to use
22"debug_stream()" to find the stream to use. The same applies to
23%destroy: we told the user she could use the members of the Parser
24class in the printers/destructors, which is not good for an operator<<
25since it is no longer bound to a particular parser, it's just a
26(standalone symbol).
27
28** Rename LR0.cc
29as lr0.cc, why upper case?
30
31** bench several bisons.
32Enhance bench.pl with %b to run different bisons.
33
34* Various
ff1b7a13
AD
35** YYERRCODE
36Defined to 256, but not used, not documented. Probably the token
37number for the error token, which POSIX wants to be 256, but which
38Bison might renumber if the user used number 256. Keep fix and doc?
39Throw away?
40
41Also, why don't we output the token name of the error token in the
42output? It is explicitly skipped:
43
44 /* Skip error token and tokens without identifier. */
45 if (sym != errtoken && id)
46
47Of course there are issues with name spaces, but if we disable we have
48something which seems to be more simpler and more consistent instead
49of the special case YYERRCODE.
50
51 enum yytokentype {
52 error = 256,
53 // ...
54 };
55
56
57We could (should?) also treat the case of the undef_token, which is
58numbered 257 for yylex, and 2 internal. Both appear for instance in
59toknum:
60
61 const unsigned short int
62 parser::yytoken_number_[] =
63 {
64 0, 256, 257, 258, 259, 260, 261, 262, 263, 264,
65
66while here
67
68 enum yytokentype {
69 TOK_EOF = 0,
70 TOK_EQ = 258,
71
72so both 256 and 257 are "mysterious".
73
74 const char*
75 const parser::yytname_[] =
76 {
77 "\"end of command\"", "error", "$undefined", "\"=\"", "\"break\"",
78
79
80** YYFAIL
81It is seems to be *really* obsolete now, shall we remove it?
82
83** yychar == yyempty_
84The code in yyerrlab reads:
85
86 if (yychar <= YYEOF)
87 {
88 /* Return failure if at end of input. */
89 if (yychar == YYEOF)
90 YYABORT;
91 }
92
93There are only two yychar that can be <= YYEOF: YYEMPTY and YYEOF.
94But I can't produce the situation where yychar is YYEMPTY here, is it
95really possible? The test suite does not exercise this case.
96
97This shows that it would be interesting to manage to install skeleton
98coverage analysis to the test suite.
99
100** Table definitions
101It should be very easy to factor the definition of the various tables,
102including the separation bw declaration and definition. See for
103instance b4_table_define in lalr1.cc. This way, we could even factor
104C vs. C++ definitions.
105
106* From lalr1.cc to yacc.c
107** Single stack
108Merging the three stacks in lalr1.cc simplified the code, prompted for
109other improvements and also made it faster (probably because memory
110management is performed once instead of three times). I suggest that
111we do the same in yacc.c.
112
113** yysyntax_error
114The code bw glr.c and yacc.c is really alike, we can certainly factor
115some parts.
416bd7a9 116
3c146b5e 117
2ab9a04f 118* Report
ec3bc396 119
ff1b7a13
AD
120** Figures
121Some statistics about the grammar and the parser would be useful,
122especially when asking the user to send some information about the
123grammars she is working on. We should probably also include some
124information about the variables (I'm not sure for instance we even
125specify what LR variant was used).
126
2ab9a04f
AD
127** GLR
128How would Paul like to display the conflicted actions? In particular,
742e4900 129what when two reductions are possible on a given lookahead token, but one is
2ab9a04f
AD
130part of $default. Should we make the two reductions explicit, or just
131keep $default? See the following point.
d7215705 132
2ab9a04f
AD
133** Disabled Reductions
134See `tests/conflicts.at (Defaulted Conflicted Reduction)', and decide
135what we want to do.
d7215705 136
2ab9a04f 137** Documentation
bc933ef1
AD
138Extend with error productions. The hard part will probably be finding
139the right rule so that a single state does not exhibit too many yet
140undocumented ``features''. Maybe an empty action ought to be
141presented too. Shall we try to make a single grammar with all these
142features, or should we have several very small grammars?
ec3bc396 143
2ab9a04f
AD
144** --report=conflict-path
145Provide better assistance for understanding the conflicts by providing
146a sample text exhibiting the (LALR) ambiguity. See the paper from
147DeRemer and Penello: they already provide the algorithm.
148
38eb7751
PE
149** Statically check for potential ambiguities in GLR grammars. See
150<http://www.i3s.unice.fr/~schmitz/papers.html#expamb> for an approach.
151
ec3bc396 152
948be909 153* Extensions
2ab9a04f 154
959e5f51
AD
155** $-1
156We should find a means to provide an access to values deep in the
157stack. For instance, instead of
158
ff1b7a13 159 baz: qux { $$ = $<foo>-1 + $<bar>0 + $1; }
959e5f51
AD
160
161we should be able to have:
162
163 foo($foo) bar($bar) baz($bar): qux($qux) { $baz = $foo + $bar + $qux; }
164
165Or something like this.
166
f0e48240
AD
167** %if and the like
168It should be possible to have %if/%else/%endif. The implementation is
169not clear: should it be lexical or syntactic. Vadim Maslow thinks it
170must be in the scanner: we must not parse what is in a switched off
171part of %if. Akim Demaille thinks it should be in the parser, so as
172to avoid falling into another CPP mistake.
173
ca752c34
AD
174** XML Output
175There are couple of available extensions of Bison targeting some XML
176output. Some day we should consider including them. One issue is
177that they seem to be quite orthogonal to the parsing technique, and
178seem to depend mostly on the possibility to have some code triggered
179for each reduction. As a matter of fact, such hooks could also be
180used to generate the yydebug traces. Some generic scheme probably
181exists in there.
182
183XML output for GNU Bison and gcc
184 http://www.cs.may.ie/~jpower/Research/bisonXML/
185
186XML output for GNU Bison
187 http://yaxx.sourceforge.net/
f0e48240 188
fa770c86
AD
189* Unit rules
190Maybe we could expand unit rules, i.e., transform
191
ff1b7a13
AD
192 exp: arith | bool;
193 arith: exp '+' exp;
194 bool: exp '&' exp;
fa770c86
AD
195
196into
197
ff1b7a13 198 exp: exp '+' exp | exp '&' exp;
fa770c86
AD
199
200when there are no actions. This can significantly speed up some
d7215705
AD
201grammars. I can't find the papers. In particular the book `LR
202parsing: Theory and Practice' is impossible to find, but according to
203`Parsing Techniques: a Practical Guide', it includes information about
204this issue. Does anybody have it?
fa770c86 205
51dec47b 206
51dec47b 207
2ab9a04f 208* Documentation
51dec47b 209
2ab9a04f
AD
210** History/Bibliography
211Some history of Bison and some bibliography would be most welcome.
212Are there any Texinfo standards for bibliography?
213
2ab9a04f
AD
214* Coding system independence
215Paul notes:
216
ff1b7a13
AD
217 Currently Bison assumes 8-bit bytes (i.e. that UCHAR_MAX is
218 255). It also assumes that the 8-bit character encoding is
219 the same for the invocation of 'bison' as it is for the
220 invocation of 'cc', but this is not necessarily true when
221 people run bison on an ASCII host and then use cc on an EBCDIC
222 host. I don't think these topics are worth our time
223 addressing (unless we find a gung-ho volunteer for EBCDIC or
224 PDP-10 ports :-) but they should probably be documented
225 somewhere.
fa770c86 226
ff1b7a13
AD
227 More importantly, Bison does not currently allow NUL bytes in
228 tokens, either via escapes (e.g., "x\0y") or via a NUL byte in
229 the source code. This should get fixed.
aef1ffd5 230
bcb05e75 231* --graph
45567173 232Show reductions.
bcb05e75 233
704a47c4 234* Broken options ?
45567173
AD
235** %token-table
236** Skeleton strategy
728c4be2 237Must we keep %token-table?
416bd7a9 238
0e95c1dd 239* Precedence
2ab9a04f
AD
240
241** Partial order
0e95c1dd
AD
242It is unfortunate that there is a total order for precedence. It
243makes it impossible to have modular precedence information. We should
2ab9a04f 244move to partial orders (sounds like series/parallel orders to me).
0e95c1dd 245
2ab9a04f
AD
246** RR conflicts
247See if we can use precedence between rules to solve RR conflicts. See
248what POSIX says.
249
250
69991a58
AD
251* $undefined
252From Hans:
253- If the Bison generated parser experiences an undefined number in the
254character range, that character is written out in diagnostic messages, an
255addition to the $undefined value.
256
257Suggest: Change the name $undefined to undefined; looks better in outputs.
258
2ab9a04f 259
69991a58
AD
260* Default Action
261From Hans:
262- For use with my C++ parser, I transported the "switch (yyn)" statement
263that Bison writes to the bison.simple skeleton file. This way, I can remove
264the current default rule $$ = $1 implementation, which causes a double
265assignment to $$ which may not be OK under C++, replacing it with a
266"default:" part within the switch statement.
267
268Note that the default rule $$ = $1, when typed, is perfectly OK under C,
269but in the C++ implementation I made, this rule is different from
270$<type_name>$ = $<type_name>1. I therefore think that one should implement
271a Bison option where every typed default rule is explicitly written out
272(same typed ruled can of course be grouped together).
273
274* Pre and post actions.
275From: Florian Krohm <florian@edamail.fishkill.ibm.com>
276Subject: YYACT_EPILOGUE
277To: bug-bison@gnu.org
278X-Sent: 1 week, 4 days, 14 hours, 38 minutes, 11 seconds ago
279
280The other day I had the need for explicitly building the parse tree. I
281used %locations for that and defined YYLLOC_DEFAULT to call a function
282that returns the tree node for the production. Easy. But I also needed
283to assign the S-attribute to the tree node. That cannot be done in
284YYLLOC_DEFAULT, because it is invoked before the action is executed.
285The way I solved this was to define a macro YYACT_EPILOGUE that would
286be invoked after the action. For reasons of symmetry I also added
287YYACT_PROLOGUE. Although I had no use for that I can envision how it
288might come in handy for debugging purposes.
76551463 289All is needed is to add
69991a58
AD
290
291#if YYLSP_NEEDED
292 YYACT_EPILOGUE (yyval, (yyvsp - yylen), yylen, yyloc, (yylsp - yylen));
293#else
294 YYACT_EPILOGUE (yyval, (yyvsp - yylen), yylen);
295#endif
296
297at the proper place to bison.simple. Ditto for YYACT_PROLOGUE.
298
299I was wondering what you think about adding YYACT_PROLOGUE/EPILOGUE
300to bison. If you're interested, I'll work on a patch.
301
35fe0834
PE
302* Better graphics
303Equip the parser with a means to create the (visual) parse tree.
d7215705 304
ff1b7a13
AD
305* Complaint submessage indentation.
306We already have an implementation that works fairly well for named
307reference messages, but it would be nice to use it consistently for all
308submessages from Bison. For example, the "previous definition"
309submessage or the list of correct values for a %define variable might
310look better with indentation.
311
312However, the current implementation makes the assumption that the
313location printed on the first line is not usually much shorter than the
314locations printed on the submessage lines that follow. That assumption
315may not hold true as often for some kinds of submessages especially if
316we ever support multiple grammar files.
317
318Here's a proposal for how a new implementation might look:
319
320 http://lists.gnu.org/archive/html/bison-patches/2009-09/msg00086.html
321
322
323Local Variables:
324mode: outline
325coding: utf-8
326End:
327
f294a2c2
AD
328-----
329
c932d613 330Copyright (C) 2001-2004, 2006, 2008-2012 Free Software Foundation, Inc.
f294a2c2 331
51cbef6f 332This file is part of Bison, the GNU Compiler Compiler.
f294a2c2 333
f16b0819 334This program is free software: you can redistribute it and/or modify
f294a2c2 335it under the terms of the GNU General Public License as published by
f16b0819
PE
336the Free Software Foundation, either version 3 of the License, or
337(at your option) any later version.
f294a2c2 338
f16b0819 339This program is distributed in the hope that it will be useful,
f294a2c2
AD
340but WITHOUT ANY WARRANTY; without even the implied warranty of
341MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
342GNU General Public License for more details.
343
344You should have received a copy of the GNU General Public License
f16b0819 345along with this program. If not, see <http://www.gnu.org/licenses/>.