]> git.saurik.com Git - bison.git/blame - TODO
* tests/sets.at (AT_EXTRACT_SETS): Don't use 8 char long sed labels.
[bison.git] / TODO
CommitLineData
416bd7a9
MA
1-*- outline -*-
2
76551463
AD
3
4* URGENT: Prologue
5The %union is declared after the user C declarations. It can be
6a problem if YYSTYPE is declared after the user part.
7
8Actually, the real problem seems that the %union ought to be output
9where it was defined. For instance, in gettext/intl/plural.y, we
10have:
11
12 %{
13 ...
14 #include "gettextP.h"
15 ...
16 %}
17
18 %union {
19 unsigned long int num;
20 enum operator op;
21 struct expression *exp;
22 }
23
24 %{
25 ...
26 static int yylex PARAMS ((YYSTYPE *lval, const char **pexp));
27 ...
28 %}
29
30Where the first part defines struct expression, the second uses it to
31define YYSTYPE, and the last uses YYSTYPE. Only this order is valid.
32
33Note that we have the same problem with GCC.
34
35I suggest splitting the prologue into pre-prologue and post-prologue.
36The reason is that:
37
381. we keep language independance as it is the skeleton that joins the
39two prologues (there is no need for the engine to encode union yystype
40and to output it inside the prologue, which breaks the language
41independance of the generator)
42
432. that makes it possible to have several %union in input. I think
44this is a pleasant (but useless currently) feature, but in the future,
45I want a means to %include other bits of grammars, and _then_ it will
46be important for the various bits to define their needs in %union.
47
eaff5ee3 48* Coding system independence
4358321a 49Paul notes:
eaff5ee3
AD
50
51 Currently Bison assumes 8-bit bytes (i.e. that UCHAR_MAX is
52 255). It also assumes that the 8-bit character encoding is
53 the same for the invocation of 'bison' as it is for the
54 invocation of 'cc', but this is not necessarily true when
55 people run bison on an ASCII host and then use cc on an EBCDIC
56 host. I don't think these topics are worth our time
57 addressing (unless we find a gung-ho volunteer for EBCDIC or
58 PDP-10 ports :-) but they should probably be documented
59 somewhere.
60
61* Using enums instead of int for tokens.
62Paul suggests:
63
64 #ifndef YYTOKENTYPE
65 # if defined (__STDC__) || defined (__cplusplus)
66 /* Put the tokens into the symbol table, so that GDB and other debuggers
67 know about them. */
68 enum yytokentype {
69 FOO = 256,
70 BAR,
71 ...
72 };
73 /* POSIX requires `int' for tokens in interfaces. */
74 # define YYTOKENTYPE int
75 # endif
76 #endif
77 #define FOO 256
78 #define BAR 257
79 ...
80
4358321a
AD
81> I'm in favor of
82>
83> %token FOO 256
84> %token BAR 257
85>
86> and Bison moves error into 258.
87
88Yes, I think that's a valid extension too, if the user doesn't define
89the token number for error.
90
8b3ba7ff
AD
91* Output directory
92Akim:
93
94| I consider this to be a bug in bison:
95|
96| /tmp % mkdir src
97| /tmp % cp ~/src/bison/tests/calc.y src
98| /tmp % mkdir build && cd build
99| /tmp/build % bison ../src/calc.y
100| /tmp/build % cd ..
101| /tmp % ls -l build src
102| build:
103| total 0
104|
105| src:
106| total 32
107| -rw-r--r-- 1 akim lrde 27553 oct 2 16:31 calc.tab.c
108| -rw-r--r-- 1 akim lrde 3335 oct 2 16:31 calc.y
109|
110|
111| Would it be safe to change this behavior to something more reasonable?
112| Do you think some people depend upon this?
113
114Jim:
115
116Is it that behavior documented?
117If so, then it's probably not reasonable to change it.
118I've Cc'd the automake list, because some of automake's
119rules use bison through $(YACC) -- though I'll bet they
120all use it in yacc-compatible mode.
121
122Pavel:
123
124Hello, Jim and others!
125
126> Is it that behavior documented?
127> If so, then it's probably not reasonable to change it.
128> I've Cc'd the automake list, because some of automake's
129> rules use bison through $(YACC) -- though I'll bet they
130> all use it in yacc-compatible mode.
131
132Yes, Automake currently used bison in Automake-compatible mode, but it
133would be fair for Automake to switch to the native mode as long as the
134processed files are distributed and "missing" emulates bison.
135
136In any case, the makefiles should specify the output file explicitly
137instead of relying on weird defaults.
138
139> | src:
140> | total 32
141> | -rw-r--r-- 1 akim lrde 27553 oct 2 16:31 calc.tab.c
142> | -rw-r--r-- 1 akim lrde 3335 oct 2 16:31 calc.y
143
144This is not _that_ ugly as it seems - with Automake you want to put
145sources where they belong - to the source directory.
146
147> | This is not _that_ ugly as it seems - with Automake you want to put
148> | sources where they belong - to the source directory.
149>
150> The difference source/build you are referring to is based on Automake
151> concepts. They have no sense at all for tools such as bison or gcc
152> etc. They have input and output. I do not want them to try to grasp
153> source/build. I want them to behave uniformly: output *here*.
154
155I realize that.
156
157It's unfortunate that the native mode of Bison behaves in a less uniform
158way than the yacc mode. I agree with your point. Bison maintainters may
159want to fix it along with the documentation.
160
161
fa770c86
AD
162* Unit rules
163Maybe we could expand unit rules, i.e., transform
164
165 exp: arith | bool;
166 arith: exp '+' exp;
167 bool: exp '&' exp;
168
169into
170
171 exp: exp '+' exp | exp '&' exp;
172
173when there are no actions. This can significantly speed up some
174grammars.
175
51dec47b
AD
176* Stupid error messages
177An example shows it easily:
178
179src/bison/tests % ./testsuite -k calc,location,error-verbose -l
180GNU Bison 1.49a test suite test groups:
181
182 NUM: FILENAME:LINE TEST-GROUP-NAME
183 KEYWORDS
184
185 51: calc.at:440 Calculator --locations --yyerror-verbose
186 52: calc.at:442 Calculator --defines --locations --name-prefix=calc --verbose --yacc --yyerror-verbose
187 54: calc.at:445 Calculator --debug --defines --locations --name-prefix=calc --verbose --yacc --yyerror-verbose
188src/bison/tests % ./testsuite 51 -d
189## --------------------------- ##
190## GNU Bison 1.49a test suite. ##
191## --------------------------- ##
192 51: calc.at:440 ok
193## ---------------------------- ##
194## All 1 tests were successful. ##
195## ---------------------------- ##
196src/bison/tests % cd ./testsuite.dir/51
197tests/testsuite.dir/51 % echo "()" | ./calc
1981.2-1.3: parse error, unexpected ')', expecting error or "number" or '-' or '('
fa770c86 199
01c56de4
AD
200* yyerror, yyprint interface
201It should be improved, in particular when using Bison features such as
202locations, and YYPARSE_PARAMS. For the time being, it is recommended
203to #define yyerror and yyprint to steal internal variables...
204
fa770c86
AD
205* read_pipe.c
206This is not portable to DOS for instance. Implement a more portable
207scheme. Sources of inspiration include GNU diff, and Free Recode.
208
aef1ffd5
AD
209* Memory leaks in the generator
210A round of memory leak clean ups would be most welcome. Dmalloc,
211Checker GCC, Electric Fence, or Valgrind: you chose your tool.
212
213* Memory leaks in the parser
214The same applies to the generated parsers. In particular, this is
215critical for user data: when aborting a parsing, when handling the
216error token etc., we often throw away yylval without giving a chance
217of cleaning it up to the user.
218
bcb05e75
MA
219* --graph
220Show reductions. []
221
704a47c4 222* Broken options ?
c3995d99 223** %no-lines [ok]
04a76783 224** %no-parser []
fbbf9b3b 225** %pure-parser []
04a76783
MA
226** %semantic-parser []
227** %token-table []
228** Options which could use parse_dquoted_param ().
229Maybe transfered in lex.c.
230*** %skeleton [ok]
231*** %output []
232*** %file-prefix []
233*** %name-prefix []
ec93a213 234
fbbf9b3b 235** Skeleton strategy. []
c3a8cbaa
MA
236Must we keep %no-parser?
237 %token-table?
fbbf9b3b 238*** New skeletons. []
416bd7a9 239
c111e171 240* src/print_graph.c
31b53af2 241Find the best graph parameters. []
63c2d5de
MA
242
243* doc/bison.texinfo
1a4648ff 244** Update
c3a8cbaa 245informations about ERROR_VERBOSE. []
1a4648ff 246** Add explainations about
c3a8cbaa
MA
247skeleton muscles. []
248%skeleton. []
eeeb962b 249
704a47c4 250* testsuite
c3a8cbaa
MA
251** tests/pure-parser.at []
252New tests.
0f8d586a
AD
253
254* Debugging parsers
255
256From Greg McGary:
257
258akim demaille <akim.demaille@epita.fr> writes:
259
260> With great pleasure! Nonetheless, things which are debatable
261> (or not, but just `big') should be discuss in `public': something
262> like help- or bug-bison@gnu.org is just fine. Jesse and I are there,
263> but there is also Jim and some other people.
264
265I have no idea whether it qualifies as big or controversial, so I'll
266just summarize for you. I proposed this change years ago and was
267surprised that it was met with utter indifference!
268
269This debug feature is for the programs/grammars one develops with
270bison, not for debugging bison itself. I find that the YYDEBUG
271output comes in a very inconvenient format for my purposes.
272When debugging gcc, for instance, what I want is to see a trace of
273the sequence of reductions and the line#s for the semantic actions
274so I can follow what's happening. Single-step in gdb doesn't cut it
275because to move from one semantic action to the next takes you through
276lots of internal machinery of the parser, which is uninteresting.
277
278The change I made was to the format of the debug output, so that it
279comes out in the format of C error messages, digestible by emacs
280compile mode, like so:
281
282grammar.y:1234: foo: bar(0x123456) baz(0x345678)
283
284where "foo: bar baz" is the reduction rule, whose semantic action
285appears on line 1234 of the bison grammar file grammar.y. The hex
286numbers on the rhs tokens are the parse-stack values associated with
287those tokens. Of course, yytype might be something totally
288incompatible with that representation, but for the most part, yytype
289values are single words (scalars or pointers). In the case of gcc,
290they're most often pointers to tree nodes. Come to think of it, the
291right thing to do is to make the printing of stack values be
292user-definable. It would also be useful to include the filename &
293line# of the file being parsed, but the main filename & line# should
294continue to be that of grammar.y
295
296Anyway, this feature has saved my life on numerous occasions. The way
297I customarily use it is to first run bison with the traces on, isolate
298the sequence of reductions that interests me, put those traces in a
299buffer and force it into compile-mode, then visit each of those lines
300in the grammar and set breakpoints with C-x SPACE. Then, I can run
301again under the control of gdb and stop at each semantic action.
302With the hex addresses of tree nodes, I can inspect the values
303associated with any rhs token.
304
305You like?
cd6a695e
AD
306
307* input synclines
308Some users create their foo.y files, and equip them with #line. Bison
309should recognize these, and preserve them.
0e95c1dd
AD
310
311* BTYacc
312See if we can integrate backtracking in Bison. Contact the BTYacc
313maintainers.
314
315* Automaton report
316Display more clearly the lookaheads for each item.
317
318* RR conflicts
319See if we can use precedence between rules to solve RR conflicts. See
320what POSIX says.
321
322* Precedence
323It is unfortunate that there is a total order for precedence. It
324makes it impossible to have modular precedence information. We should
325move to partial orders.
326
327* Parsing grammars
328Rewrite the reader in Bison.
f294a2c2 329
20c37f21
AD
330* Problems with aliases
331From: "Baum, Nathan I" <s0009525@chelt.ac.uk>
332Subject: Token Alias Bug
333To: "'bug-bison@gnu.org'" <bug-bison@gnu.org>
334
335I've noticed a bug in bison. Sadly, our eternally wise sysadmins won't let
336us use CVS, so I can't find out if it's been fixed already...
337
338Basically, I made a program (in flex) that went through a .y file looking
339for "..."-tokens, and then outputed a %token
340line for it. For single-character ""-tokens, I reasoned, I could just use
341[%token 'A' "A"]. However, this causes Bison to output a [#define 'A' 65],
342which cppp chokes on, not unreasonably. (And even if cppp didn't choke, I
343obviously wouldn't want (char)'A' to be replaced with (int)65 throughout my
344code.
345
346Bison normally forgoes outputing a #define for a character token. However,
347it always outputs an aliased token -- even if the token is an alias for a
348character token. We don't want that. The problem is in /output.c/, as I
349recall. When it outputs the token definitions, it checks for a character
350token, and then checks for an alias token. If the character token check is
351placed after the alias check, then it works correctly.
352
353Alias tokens seem to be something of a kludge. What about an [%alias "..."]
354command...
355
356 %alias T_IF "IF"
357
358Hmm. I can't help thinking... What about a --generate-lex option that
359creates an .l file for the alias tokens used... (Or an option to make a
360gperf file, etc...)
361
362* Presentation of the report file
363From: "Baum, Nathan I" <s0009525@chelt.ac.uk>
364Subject: Token Alias Bug
365To: "'bug-bison@gnu.org'" <bug-bison@gnu.org>
366
367I've also noticed something, that whilst not *wrong*, is inconvienient: I
368use the verbose mode to help find the causes of unresolved shift/reduce
369conflicts. However, this mode insists on starting the .output file with a
370list of *resolved* conflicts, something I find quite useless. Might it be
371possible to define a -v mode, and a -vv mode -- Where the -vv mode shows
372everything, but the -v mode only tells you what you need for examining
373conflicts? (Or, perhaps, a "*** This state has N conflicts ***" marker above
374each state with conflicts.)
375
376
69991a58
AD
377* $undefined
378From Hans:
379- If the Bison generated parser experiences an undefined number in the
380character range, that character is written out in diagnostic messages, an
381addition to the $undefined value.
382
383Suggest: Change the name $undefined to undefined; looks better in outputs.
384
385* Default Action
386From Hans:
387- For use with my C++ parser, I transported the "switch (yyn)" statement
388that Bison writes to the bison.simple skeleton file. This way, I can remove
389the current default rule $$ = $1 implementation, which causes a double
390assignment to $$ which may not be OK under C++, replacing it with a
391"default:" part within the switch statement.
392
393Note that the default rule $$ = $1, when typed, is perfectly OK under C,
394but in the C++ implementation I made, this rule is different from
395$<type_name>$ = $<type_name>1. I therefore think that one should implement
396a Bison option where every typed default rule is explicitly written out
397(same typed ruled can of course be grouped together).
398
399* Pre and post actions.
400From: Florian Krohm <florian@edamail.fishkill.ibm.com>
401Subject: YYACT_EPILOGUE
402To: bug-bison@gnu.org
403X-Sent: 1 week, 4 days, 14 hours, 38 minutes, 11 seconds ago
404
405The other day I had the need for explicitly building the parse tree. I
406used %locations for that and defined YYLLOC_DEFAULT to call a function
407that returns the tree node for the production. Easy. But I also needed
408to assign the S-attribute to the tree node. That cannot be done in
409YYLLOC_DEFAULT, because it is invoked before the action is executed.
410The way I solved this was to define a macro YYACT_EPILOGUE that would
411be invoked after the action. For reasons of symmetry I also added
412YYACT_PROLOGUE. Although I had no use for that I can envision how it
413might come in handy for debugging purposes.
76551463 414All is needed is to add
69991a58
AD
415
416#if YYLSP_NEEDED
417 YYACT_EPILOGUE (yyval, (yyvsp - yylen), yylen, yyloc, (yylsp - yylen));
418#else
419 YYACT_EPILOGUE (yyval, (yyvsp - yylen), yylen);
420#endif
421
422at the proper place to bison.simple. Ditto for YYACT_PROLOGUE.
423
424I was wondering what you think about adding YYACT_PROLOGUE/EPILOGUE
425to bison. If you're interested, I'll work on a patch.
426
f294a2c2
AD
427-----
428
429Copyright (C) 2001, 2002 Free Software Foundation, Inc.
430
431This file is part of GNU Autoconf.
432
433GNU Autoconf is free software; you can redistribute it and/or modify
434it under the terms of the GNU General Public License as published by
435the Free Software Foundation; either version 2, or (at your option)
436any later version.
437
438GNU Autoconf is distributed in the hope that it will be useful,
439but WITHOUT ANY WARRANTY; without even the implied warranty of
440MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
441GNU General Public License for more details.
442
443You should have received a copy of the GNU General Public License
444along with autoconf; see the file COPYING. If not, write to
445the Free Software Foundation, Inc., 59 Temple Place - Suite 330,
446Boston, MA 02111-1307, USA.