]> git.saurik.com Git - bison.git/blame_incremental - TODO
minor refactoring in user code scanning
[bison.git] / TODO
... / ...
CommitLineData
1* Short term
2** scan-code.l
3Avoid variables for format strings, as then GCC cannot check them.
4
5** m4 names
6b4_shared_declarations is no longer what it is. Make it
7b4_parser_declaration for instance.
8
9** glr.cc: %defines
10it should not be mandatory.
11
12** $ and others in epilogue
13A stray $ is a warning in the actions, but an error in the epilogue.
14IMHO, it should not even be a warning in the epilogue.
15
16** obstack_copy etc.
17There seems to be some other interesting functions for obstacks that
18we should consider using.
19
20** stack.hh
21Get rid of it. The original idea is nice, but actually it makes
22the code harder to follow, and uselessly different from the other
23skeletons.
24
25** Variable names.
26What should we name `variant' and `lex_symbol'?
27
28** Get rid of fake #lines [Bison: ...]
29Possibly as simple as checking whether the column number is nonnegative.
30
31I have seen messages like the following from GCC.
32
33<built-in>:0: fatal error: opening dependency file .deps/libltdl/argz.Tpo: No such file or directory
34
35
36** Discuss about %printer/%destroy in the case of C++.
37It would be very nice to provide the symbol classes with an operator<<
38and a destructor. Unfortunately the syntax we have chosen for
39%destroy and %printer make them hard to reuse. For instance, the user
40is invited to write something like
41
42 %printer { debug_stream() << $$; } <my_type>;
43
44which is hard to reuse elsewhere since it wants to use
45"debug_stream()" to find the stream to use. The same applies to
46%destroy: we told the user she could use the members of the Parser
47class in the printers/destructors, which is not good for an operator<<
48since it is no longer bound to a particular parser, it's just a
49(standalone symbol).
50
51** Rename LR0.cc
52as lr0.cc, why upper case?
53
54** bench several bisons.
55Enhance bench.pl with %b to run different bisons.
56
57* Various
58** YYERRCODE
59Defined to 256, but not used, not documented. Probably the token
60number for the error token, which POSIX wants to be 256, but which
61Bison might renumber if the user used number 256. Keep fix and doc?
62Throw away?
63
64Also, why don't we output the token name of the error token in the
65output? It is explicitly skipped:
66
67 /* Skip error token and tokens without identifier. */
68 if (sym != errtoken && id)
69
70Of course there are issues with name spaces, but if we disable we have
71something which seems to be more simpler and more consistent instead
72of the special case YYERRCODE.
73
74 enum yytokentype {
75 error = 256,
76 // ...
77 };
78
79
80We could (should?) also treat the case of the undef_token, which is
81numbered 257 for yylex, and 2 internal. Both appear for instance in
82toknum:
83
84 const unsigned short int
85 parser::yytoken_number_[] =
86 {
87 0, 256, 257, 258, 259, 260, 261, 262, 263, 264,
88
89while here
90
91 enum yytokentype {
92 TOK_EOF = 0,
93 TOK_EQ = 258,
94
95so both 256 and 257 are "mysterious".
96
97 const char*
98 const parser::yytname_[] =
99 {
100 "\"end of command\"", "error", "$undefined", "\"=\"", "\"break\"",
101
102
103** yychar == yyempty_
104The code in yyerrlab reads:
105
106 if (yychar <= YYEOF)
107 {
108 /* Return failure if at end of input. */
109 if (yychar == YYEOF)
110 YYABORT;
111 }
112
113There are only two yychar that can be <= YYEOF: YYEMPTY and YYEOF.
114But I can't produce the situation where yychar is YYEMPTY here, is it
115really possible? The test suite does not exercise this case.
116
117This shows that it would be interesting to manage to install skeleton
118coverage analysis to the test suite.
119
120** Table definitions
121It should be very easy to factor the definition of the various tables,
122including the separation bw declaration and definition. See for
123instance b4_table_define in lalr1.cc. This way, we could even factor
124C vs. C++ definitions.
125
126* From lalr1.cc to yacc.c
127** Single stack
128Merging the three stacks in lalr1.cc simplified the code, prompted for
129other improvements and also made it faster (probably because memory
130management is performed once instead of three times). I suggest that
131we do the same in yacc.c.
132
133** yysyntax_error
134The code bw glr.c and yacc.c is really alike, we can certainly factor
135some parts.
136
137
138* Report
139
140** Figures
141Some statistics about the grammar and the parser would be useful,
142especially when asking the user to send some information about the
143grammars she is working on. We should probably also include some
144information about the variables (I'm not sure for instance we even
145specify what LR variant was used).
146
147** GLR
148How would Paul like to display the conflicted actions? In particular,
149what when two reductions are possible on a given lookahead token, but one is
150part of $default. Should we make the two reductions explicit, or just
151keep $default? See the following point.
152
153** Disabled Reductions
154See `tests/conflicts.at (Defaulted Conflicted Reduction)', and decide
155what we want to do.
156
157** Documentation
158Extend with error productions. The hard part will probably be finding
159the right rule so that a single state does not exhibit too many yet
160undocumented ``features''. Maybe an empty action ought to be
161presented too. Shall we try to make a single grammar with all these
162features, or should we have several very small grammars?
163
164** --report=conflict-path
165Provide better assistance for understanding the conflicts by providing
166a sample text exhibiting the (LALR) ambiguity. See the paper from
167DeRemer and Penello: they already provide the algorithm.
168
169** Statically check for potential ambiguities in GLR grammars. See
170<http://www.i3s.unice.fr/~schmitz/papers.html#expamb> for an approach.
171
172
173* Extensions
174
175** $-1
176We should find a means to provide an access to values deep in the
177stack. For instance, instead of
178
179 baz: qux { $$ = $<foo>-1 + $<bar>0 + $1; }
180
181we should be able to have:
182
183 foo($foo) bar($bar) baz($bar): qux($qux) { $baz = $foo + $bar + $qux; }
184
185Or something like this.
186
187** %if and the like
188It should be possible to have %if/%else/%endif. The implementation is
189not clear: should it be lexical or syntactic. Vadim Maslow thinks it
190must be in the scanner: we must not parse what is in a switched off
191part of %if. Akim Demaille thinks it should be in the parser, so as
192to avoid falling into another CPP mistake.
193
194** XML Output
195There are couple of available extensions of Bison targeting some XML
196output. Some day we should consider including them. One issue is
197that they seem to be quite orthogonal to the parsing technique, and
198seem to depend mostly on the possibility to have some code triggered
199for each reduction. As a matter of fact, such hooks could also be
200used to generate the yydebug traces. Some generic scheme probably
201exists in there.
202
203XML output for GNU Bison and gcc
204 http://www.cs.may.ie/~jpower/Research/bisonXML/
205
206XML output for GNU Bison
207 http://yaxx.sourceforge.net/
208
209* Unit rules
210Maybe we could expand unit rules, i.e., transform
211
212 exp: arith | bool;
213 arith: exp '+' exp;
214 bool: exp '&' exp;
215
216into
217
218 exp: exp '+' exp | exp '&' exp;
219
220when there are no actions. This can significantly speed up some
221grammars. I can't find the papers. In particular the book `LR
222parsing: Theory and Practice' is impossible to find, but according to
223`Parsing Techniques: a Practical Guide', it includes information about
224this issue. Does anybody have it?
225
226
227
228* Documentation
229
230** History/Bibliography
231Some history of Bison and some bibliography would be most welcome.
232Are there any Texinfo standards for bibliography?
233
234* Coding system independence
235Paul notes:
236
237 Currently Bison assumes 8-bit bytes (i.e. that UCHAR_MAX is
238 255). It also assumes that the 8-bit character encoding is
239 the same for the invocation of 'bison' as it is for the
240 invocation of 'cc', but this is not necessarily true when
241 people run bison on an ASCII host and then use cc on an EBCDIC
242 host. I don't think these topics are worth our time
243 addressing (unless we find a gung-ho volunteer for EBCDIC or
244 PDP-10 ports :-) but they should probably be documented
245 somewhere.
246
247 More importantly, Bison does not currently allow NUL bytes in
248 tokens, either via escapes (e.g., "x\0y") or via a NUL byte in
249 the source code. This should get fixed.
250
251* --graph
252Show reductions.
253
254* Broken options ?
255** %token-table
256** Skeleton strategy
257Must we keep %token-table?
258
259* Precedence
260
261** Partial order
262It is unfortunate that there is a total order for precedence. It
263makes it impossible to have modular precedence information. We should
264move to partial orders (sounds like series/parallel orders to me).
265
266** RR conflicts
267See if we can use precedence between rules to solve RR conflicts. See
268what POSIX says.
269
270
271* $undefined
272From Hans:
273- If the Bison generated parser experiences an undefined number in the
274character range, that character is written out in diagnostic messages, an
275addition to the $undefined value.
276
277Suggest: Change the name $undefined to undefined; looks better in outputs.
278
279
280* Default Action
281From Hans:
282- For use with my C++ parser, I transported the "switch (yyn)" statement
283that Bison writes to the bison.simple skeleton file. This way, I can remove
284the current default rule $$ = $1 implementation, which causes a double
285assignment to $$ which may not be OK under C++, replacing it with a
286"default:" part within the switch statement.
287
288Note that the default rule $$ = $1, when typed, is perfectly OK under C,
289but in the C++ implementation I made, this rule is different from
290$<type_name>$ = $<type_name>1. I therefore think that one should implement
291a Bison option where every typed default rule is explicitly written out
292(same typed ruled can of course be grouped together).
293
294* Pre and post actions.
295From: Florian Krohm <florian@edamail.fishkill.ibm.com>
296Subject: YYACT_EPILOGUE
297To: bug-bison@gnu.org
298X-Sent: 1 week, 4 days, 14 hours, 38 minutes, 11 seconds ago
299
300The other day I had the need for explicitly building the parse tree. I
301used %locations for that and defined YYLLOC_DEFAULT to call a function
302that returns the tree node for the production. Easy. But I also needed
303to assign the S-attribute to the tree node. That cannot be done in
304YYLLOC_DEFAULT, because it is invoked before the action is executed.
305The way I solved this was to define a macro YYACT_EPILOGUE that would
306be invoked after the action. For reasons of symmetry I also added
307YYACT_PROLOGUE. Although I had no use for that I can envision how it
308might come in handy for debugging purposes.
309All is needed is to add
310
311#if YYLSP_NEEDED
312 YYACT_EPILOGUE (yyval, (yyvsp - yylen), yylen, yyloc, (yylsp - yylen));
313#else
314 YYACT_EPILOGUE (yyval, (yyvsp - yylen), yylen);
315#endif
316
317at the proper place to bison.simple. Ditto for YYACT_PROLOGUE.
318
319I was wondering what you think about adding YYACT_PROLOGUE/EPILOGUE
320to bison. If you're interested, I'll work on a patch.
321
322* Better graphics
323Equip the parser with a means to create the (visual) parse tree.
324
325* Complaint submessage indentation.
326We already have an implementation that works fairly well for named
327reference messages, but it would be nice to use it consistently for all
328submessages from Bison. For example, the "previous definition"
329submessage or the list of correct values for a %define variable might
330look better with indentation.
331
332However, the current implementation makes the assumption that the
333location printed on the first line is not usually much shorter than the
334locations printed on the submessage lines that follow. That assumption
335may not hold true as often for some kinds of submessages especially if
336we ever support multiple grammar files.
337
338Here's a proposal for how a new implementation might look:
339
340 http://lists.gnu.org/archive/html/bison-patches/2009-09/msg00086.html
341
342
343Local Variables:
344mode: outline
345coding: utf-8
346End:
347
348-----
349
350Copyright (C) 2001-2004, 2006, 2008-2012 Free Software Foundation, Inc.
351
352This file is part of Bison, the GNU Compiler Compiler.
353
354This program is free software: you can redistribute it and/or modify
355it under the terms of the GNU General Public License as published by
356the Free Software Foundation, either version 3 of the License, or
357(at your option) any later version.
358
359This program is distributed in the hope that it will be useful,
360but WITHOUT ANY WARRANTY; without even the implied warranty of
361MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
362GNU General Public License for more details.
363
364You should have received a copy of the GNU General Public License
365along with this program. If not, see <http://www.gnu.org/licenses/>.