]> git.saurik.com Git - bison.git/blame - TODO
output: no longer use b4_tokens.
[bison.git] / TODO
CommitLineData
ff1b7a13
AD
1* Short term
2** Variable names.
3What should we name `variant' and `lex_symbol'?
4
ff1b7a13
AD
5** Update the documentation on gnu.org
6
7** Get rid of fake #lines [Bison: ...]
8Possibly as simple as checking whether the column number is nonnegative.
9
10I have seen messages like the following from GCC.
11
12<built-in>:0: fatal error: opening dependency file .deps/libltdl/argz.Tpo: No such file or directory
13
14
15** Discuss about %printer/%destroy in the case of C++.
16It would be very nice to provide the symbol classes with an operator<<
17and a destructor. Unfortunately the syntax we have chosen for
18%destroy and %printer make them hard to reuse. For instance, the user
19is invited to write something like
20
21 %printer { debug_stream() << $$; } <my_type>;
22
23which is hard to reuse elsewhere since it wants to use
24"debug_stream()" to find the stream to use. The same applies to
25%destroy: we told the user she could use the members of the Parser
26class in the printers/destructors, which is not good for an operator<<
27since it is no longer bound to a particular parser, it's just a
28(standalone symbol).
29
30** Rename LR0.cc
31as lr0.cc, why upper case?
32
33** bench several bisons.
34Enhance bench.pl with %b to run different bisons.
35
36* Various
37** Warnings
38Warnings about type tags that are used in printer and dtors, but not
39for symbols?
40
41** YYERRCODE
42Defined to 256, but not used, not documented. Probably the token
43number for the error token, which POSIX wants to be 256, but which
44Bison might renumber if the user used number 256. Keep fix and doc?
45Throw away?
46
47Also, why don't we output the token name of the error token in the
48output? It is explicitly skipped:
49
50 /* Skip error token and tokens without identifier. */
51 if (sym != errtoken && id)
52
53Of course there are issues with name spaces, but if we disable we have
54something which seems to be more simpler and more consistent instead
55of the special case YYERRCODE.
56
57 enum yytokentype {
58 error = 256,
59 // ...
60 };
61
62
63We could (should?) also treat the case of the undef_token, which is
64numbered 257 for yylex, and 2 internal. Both appear for instance in
65toknum:
66
67 const unsigned short int
68 parser::yytoken_number_[] =
69 {
70 0, 256, 257, 258, 259, 260, 261, 262, 263, 264,
71
72while here
73
74 enum yytokentype {
75 TOK_EOF = 0,
76 TOK_EQ = 258,
77
78so both 256 and 257 are "mysterious".
79
80 const char*
81 const parser::yytname_[] =
82 {
83 "\"end of command\"", "error", "$undefined", "\"=\"", "\"break\"",
84
85
86** YYFAIL
87It is seems to be *really* obsolete now, shall we remove it?
88
89** yychar == yyempty_
90The code in yyerrlab reads:
91
92 if (yychar <= YYEOF)
93 {
94 /* Return failure if at end of input. */
95 if (yychar == YYEOF)
96 YYABORT;
97 }
98
99There are only two yychar that can be <= YYEOF: YYEMPTY and YYEOF.
100But I can't produce the situation where yychar is YYEMPTY here, is it
101really possible? The test suite does not exercise this case.
102
103This shows that it would be interesting to manage to install skeleton
104coverage analysis to the test suite.
105
106** Table definitions
107It should be very easy to factor the definition of the various tables,
108including the separation bw declaration and definition. See for
109instance b4_table_define in lalr1.cc. This way, we could even factor
110C vs. C++ definitions.
111
112* From lalr1.cc to yacc.c
113** Single stack
114Merging the three stacks in lalr1.cc simplified the code, prompted for
115other improvements and also made it faster (probably because memory
116management is performed once instead of three times). I suggest that
117we do the same in yacc.c.
118
119** yysyntax_error
120The code bw glr.c and yacc.c is really alike, we can certainly factor
121some parts.
416bd7a9 122
3c146b5e 123
2ab9a04f 124* Report
ec3bc396 125
ff1b7a13
AD
126** Figures
127Some statistics about the grammar and the parser would be useful,
128especially when asking the user to send some information about the
129grammars she is working on. We should probably also include some
130information about the variables (I'm not sure for instance we even
131specify what LR variant was used).
132
2ab9a04f
AD
133** GLR
134How would Paul like to display the conflicted actions? In particular,
742e4900 135what when two reductions are possible on a given lookahead token, but one is
2ab9a04f
AD
136part of $default. Should we make the two reductions explicit, or just
137keep $default? See the following point.
d7215705 138
2ab9a04f
AD
139** Disabled Reductions
140See `tests/conflicts.at (Defaulted Conflicted Reduction)', and decide
141what we want to do.
d7215705 142
2ab9a04f 143** Documentation
bc933ef1
AD
144Extend with error productions. The hard part will probably be finding
145the right rule so that a single state does not exhibit too many yet
146undocumented ``features''. Maybe an empty action ought to be
147presented too. Shall we try to make a single grammar with all these
148features, or should we have several very small grammars?
ec3bc396 149
2ab9a04f
AD
150** --report=conflict-path
151Provide better assistance for understanding the conflicts by providing
152a sample text exhibiting the (LALR) ambiguity. See the paper from
153DeRemer and Penello: they already provide the algorithm.
154
38eb7751
PE
155** Statically check for potential ambiguities in GLR grammars. See
156<http://www.i3s.unice.fr/~schmitz/papers.html#expamb> for an approach.
157
ec3bc396 158
948be909 159* Extensions
2ab9a04f 160
959e5f51
AD
161** $-1
162We should find a means to provide an access to values deep in the
163stack. For instance, instead of
164
ff1b7a13 165 baz: qux { $$ = $<foo>-1 + $<bar>0 + $1; }
959e5f51
AD
166
167we should be able to have:
168
169 foo($foo) bar($bar) baz($bar): qux($qux) { $baz = $foo + $bar + $qux; }
170
171Or something like this.
172
f0e48240
AD
173** %if and the like
174It should be possible to have %if/%else/%endif. The implementation is
175not clear: should it be lexical or syntactic. Vadim Maslow thinks it
176must be in the scanner: we must not parse what is in a switched off
177part of %if. Akim Demaille thinks it should be in the parser, so as
178to avoid falling into another CPP mistake.
179
ca752c34
AD
180** XML Output
181There are couple of available extensions of Bison targeting some XML
182output. Some day we should consider including them. One issue is
183that they seem to be quite orthogonal to the parsing technique, and
184seem to depend mostly on the possibility to have some code triggered
185for each reduction. As a matter of fact, such hooks could also be
186used to generate the yydebug traces. Some generic scheme probably
187exists in there.
188
189XML output for GNU Bison and gcc
190 http://www.cs.may.ie/~jpower/Research/bisonXML/
191
192XML output for GNU Bison
193 http://yaxx.sourceforge.net/
f0e48240 194
fa770c86
AD
195* Unit rules
196Maybe we could expand unit rules, i.e., transform
197
ff1b7a13
AD
198 exp: arith | bool;
199 arith: exp '+' exp;
200 bool: exp '&' exp;
fa770c86
AD
201
202into
203
ff1b7a13 204 exp: exp '+' exp | exp '&' exp;
fa770c86
AD
205
206when there are no actions. This can significantly speed up some
d7215705
AD
207grammars. I can't find the papers. In particular the book `LR
208parsing: Theory and Practice' is impossible to find, but according to
209`Parsing Techniques: a Practical Guide', it includes information about
210this issue. Does anybody have it?
fa770c86 211
51dec47b 212
51dec47b 213
2ab9a04f 214* Documentation
51dec47b 215
2ab9a04f
AD
216** History/Bibliography
217Some history of Bison and some bibliography would be most welcome.
218Are there any Texinfo standards for bibliography?
219
2ab9a04f
AD
220* Coding system independence
221Paul notes:
222
ff1b7a13
AD
223 Currently Bison assumes 8-bit bytes (i.e. that UCHAR_MAX is
224 255). It also assumes that the 8-bit character encoding is
225 the same for the invocation of 'bison' as it is for the
226 invocation of 'cc', but this is not necessarily true when
227 people run bison on an ASCII host and then use cc on an EBCDIC
228 host. I don't think these topics are worth our time
229 addressing (unless we find a gung-ho volunteer for EBCDIC or
230 PDP-10 ports :-) but they should probably be documented
231 somewhere.
fa770c86 232
ff1b7a13
AD
233 More importantly, Bison does not currently allow NUL bytes in
234 tokens, either via escapes (e.g., "x\0y") or via a NUL byte in
235 the source code. This should get fixed.
aef1ffd5 236
bcb05e75 237* --graph
45567173 238Show reductions.
bcb05e75 239
704a47c4 240* Broken options ?
45567173
AD
241** %token-table
242** Skeleton strategy
728c4be2 243Must we keep %token-table?
416bd7a9 244
0e95c1dd 245* Precedence
2ab9a04f
AD
246
247** Partial order
0e95c1dd
AD
248It is unfortunate that there is a total order for precedence. It
249makes it impossible to have modular precedence information. We should
2ab9a04f 250move to partial orders (sounds like series/parallel orders to me).
0e95c1dd 251
2ab9a04f
AD
252** RR conflicts
253See if we can use precedence between rules to solve RR conflicts. See
254what POSIX says.
255
256
69991a58
AD
257* $undefined
258From Hans:
259- If the Bison generated parser experiences an undefined number in the
260character range, that character is written out in diagnostic messages, an
261addition to the $undefined value.
262
263Suggest: Change the name $undefined to undefined; looks better in outputs.
264
2ab9a04f 265
69991a58
AD
266* Default Action
267From Hans:
268- For use with my C++ parser, I transported the "switch (yyn)" statement
269that Bison writes to the bison.simple skeleton file. This way, I can remove
270the current default rule $$ = $1 implementation, which causes a double
271assignment to $$ which may not be OK under C++, replacing it with a
272"default:" part within the switch statement.
273
274Note that the default rule $$ = $1, when typed, is perfectly OK under C,
275but in the C++ implementation I made, this rule is different from
276$<type_name>$ = $<type_name>1. I therefore think that one should implement
277a Bison option where every typed default rule is explicitly written out
278(same typed ruled can of course be grouped together).
279
280* Pre and post actions.
281From: Florian Krohm <florian@edamail.fishkill.ibm.com>
282Subject: YYACT_EPILOGUE
283To: bug-bison@gnu.org
284X-Sent: 1 week, 4 days, 14 hours, 38 minutes, 11 seconds ago
285
286The other day I had the need for explicitly building the parse tree. I
287used %locations for that and defined YYLLOC_DEFAULT to call a function
288that returns the tree node for the production. Easy. But I also needed
289to assign the S-attribute to the tree node. That cannot be done in
290YYLLOC_DEFAULT, because it is invoked before the action is executed.
291The way I solved this was to define a macro YYACT_EPILOGUE that would
292be invoked after the action. For reasons of symmetry I also added
293YYACT_PROLOGUE. Although I had no use for that I can envision how it
294might come in handy for debugging purposes.
76551463 295All is needed is to add
69991a58
AD
296
297#if YYLSP_NEEDED
298 YYACT_EPILOGUE (yyval, (yyvsp - yylen), yylen, yyloc, (yylsp - yylen));
299#else
300 YYACT_EPILOGUE (yyval, (yyvsp - yylen), yylen);
301#endif
302
303at the proper place to bison.simple. Ditto for YYACT_PROLOGUE.
304
305I was wondering what you think about adding YYACT_PROLOGUE/EPILOGUE
306to bison. If you're interested, I'll work on a patch.
307
35fe0834
PE
308* Better graphics
309Equip the parser with a means to create the (visual) parse tree.
d7215705 310
ff1b7a13
AD
311* Complaint submessage indentation.
312We already have an implementation that works fairly well for named
313reference messages, but it would be nice to use it consistently for all
314submessages from Bison. For example, the "previous definition"
315submessage or the list of correct values for a %define variable might
316look better with indentation.
317
318However, the current implementation makes the assumption that the
319location printed on the first line is not usually much shorter than the
320locations printed on the submessage lines that follow. That assumption
321may not hold true as often for some kinds of submessages especially if
322we ever support multiple grammar files.
323
324Here's a proposal for how a new implementation might look:
325
326 http://lists.gnu.org/archive/html/bison-patches/2009-09/msg00086.html
327
328
329Local Variables:
330mode: outline
331coding: utf-8
332End:
333
f294a2c2
AD
334-----
335
c932d613 336Copyright (C) 2001-2004, 2006, 2008-2012 Free Software Foundation, Inc.
f294a2c2 337
51cbef6f 338This file is part of Bison, the GNU Compiler Compiler.
f294a2c2 339
f16b0819 340This program is free software: you can redistribute it and/or modify
f294a2c2 341it under the terms of the GNU General Public License as published by
f16b0819
PE
342the Free Software Foundation, either version 3 of the License, or
343(at your option) any later version.
f294a2c2 344
f16b0819 345This program is distributed in the hope that it will be useful,
f294a2c2
AD
346but WITHOUT ANY WARRANTY; without even the implied warranty of
347MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
348GNU General Public License for more details.
349
350You should have received a copy of the GNU General Public License
f16b0819 351along with this program. If not, see <http://www.gnu.org/licenses/>.