]> git.saurik.com Git - bison.git/blame - TODO
* data/bison.simple (YYSTYPE_IS_TRIVIAL, YYLTYPE_IS_TRIVIAL):
[bison.git] / TODO
CommitLineData
416bd7a9
MA
1-*- outline -*-
2
76551463
AD
3
4* URGENT: Prologue
5The %union is declared after the user C declarations. It can be
6a problem if YYSTYPE is declared after the user part.
7
8Actually, the real problem seems that the %union ought to be output
9where it was defined. For instance, in gettext/intl/plural.y, we
10have:
11
12 %{
13 ...
14 #include "gettextP.h"
15 ...
16 %}
17
18 %union {
19 unsigned long int num;
20 enum operator op;
21 struct expression *exp;
22 }
23
24 %{
25 ...
26 static int yylex PARAMS ((YYSTYPE *lval, const char **pexp));
27 ...
28 %}
29
30Where the first part defines struct expression, the second uses it to
31define YYSTYPE, and the last uses YYSTYPE. Only this order is valid.
32
33Note that we have the same problem with GCC.
34
35I suggest splitting the prologue into pre-prologue and post-prologue.
36The reason is that:
37
381. we keep language independance as it is the skeleton that joins the
39two prologues (there is no need for the engine to encode union yystype
40and to output it inside the prologue, which breaks the language
41independance of the generator)
42
432. that makes it possible to have several %union in input. I think
44this is a pleasant (but useless currently) feature, but in the future,
45I want a means to %include other bits of grammars, and _then_ it will
46be important for the various bits to define their needs in %union.
47
5c0a0514
AD
48When implementing multiple-%union support, bare the following in mind:
49
50- when --yacc, this must be flagged as an error. Don't make it fatal
51 though.
52
53- The #line must now appear *inside* the definition of yystype.
54 Something like
55
56 {
57 #line 12 "foo.y"
58 int ival;
59 #line 23 "foo.y"
60 char *sval;
61 }
62
b4cbf822
AD
63* Language independent actions
64
65Currently bison, the generator, transforms $1, $$ and so forth into
66direct C code, manipulating the stacks. This is problematic, because
67(i) it means that if we want more languages, we need to update the
68generator, and (ii), it forces names everywhere (e.g., the C++
69skeleton would be happy to use other naming schemes, and actually,
70even other accessing schemes).
71
72Therefore we want
73
741. the generator to replace $1, etc. by M4 macro invocations
75 (b4_dollar(1), b4_at(3), b4_dollar_dollar) etc.
76
772. the skeletons to define these macros.
78
79But currently the actions are double-quoted, to protect them from M4
80evaluation. So we need to:
81
823. stop quoting them
83
844. change the [ and ] in the actions into @<:@ and @:>@
85
865. extend the postprocessor to maps these back onto [ and ].
87
eaff5ee3 88* Coding system independence
4358321a 89Paul notes:
eaff5ee3
AD
90
91 Currently Bison assumes 8-bit bytes (i.e. that UCHAR_MAX is
92 255). It also assumes that the 8-bit character encoding is
93 the same for the invocation of 'bison' as it is for the
94 invocation of 'cc', but this is not necessarily true when
95 people run bison on an ASCII host and then use cc on an EBCDIC
96 host. I don't think these topics are worth our time
97 addressing (unless we find a gung-ho volunteer for EBCDIC or
98 PDP-10 ports :-) but they should probably be documented
99 somewhere.
100
101* Using enums instead of int for tokens.
102Paul suggests:
103
104 #ifndef YYTOKENTYPE
105 # if defined (__STDC__) || defined (__cplusplus)
106 /* Put the tokens into the symbol table, so that GDB and other debuggers
107 know about them. */
108 enum yytokentype {
109 FOO = 256,
110 BAR,
111 ...
112 };
113 /* POSIX requires `int' for tokens in interfaces. */
114 # define YYTOKENTYPE int
115 # endif
116 #endif
117 #define FOO 256
118 #define BAR 257
119 ...
120
8b3ba7ff
AD
121* Output directory
122Akim:
123
124| I consider this to be a bug in bison:
125|
126| /tmp % mkdir src
127| /tmp % cp ~/src/bison/tests/calc.y src
128| /tmp % mkdir build && cd build
129| /tmp/build % bison ../src/calc.y
130| /tmp/build % cd ..
131| /tmp % ls -l build src
132| build:
133| total 0
134|
135| src:
136| total 32
137| -rw-r--r-- 1 akim lrde 27553 oct 2 16:31 calc.tab.c
138| -rw-r--r-- 1 akim lrde 3335 oct 2 16:31 calc.y
139|
140|
141| Would it be safe to change this behavior to something more reasonable?
142| Do you think some people depend upon this?
143
144Jim:
145
146Is it that behavior documented?
147If so, then it's probably not reasonable to change it.
148I've Cc'd the automake list, because some of automake's
149rules use bison through $(YACC) -- though I'll bet they
150all use it in yacc-compatible mode.
151
152Pavel:
153
154Hello, Jim and others!
155
156> Is it that behavior documented?
157> If so, then it's probably not reasonable to change it.
158> I've Cc'd the automake list, because some of automake's
159> rules use bison through $(YACC) -- though I'll bet they
160> all use it in yacc-compatible mode.
161
162Yes, Automake currently used bison in Automake-compatible mode, but it
163would be fair for Automake to switch to the native mode as long as the
164processed files are distributed and "missing" emulates bison.
165
166In any case, the makefiles should specify the output file explicitly
167instead of relying on weird defaults.
168
169> | src:
170> | total 32
171> | -rw-r--r-- 1 akim lrde 27553 oct 2 16:31 calc.tab.c
172> | -rw-r--r-- 1 akim lrde 3335 oct 2 16:31 calc.y
173
174This is not _that_ ugly as it seems - with Automake you want to put
175sources where they belong - to the source directory.
176
177> | This is not _that_ ugly as it seems - with Automake you want to put
178> | sources where they belong - to the source directory.
179>
180> The difference source/build you are referring to is based on Automake
181> concepts. They have no sense at all for tools such as bison or gcc
182> etc. They have input and output. I do not want them to try to grasp
183> source/build. I want them to behave uniformly: output *here*.
184
185I realize that.
186
187It's unfortunate that the native mode of Bison behaves in a less uniform
188way than the yacc mode. I agree with your point. Bison maintainters may
189want to fix it along with the documentation.
190
191
fa770c86
AD
192* Unit rules
193Maybe we could expand unit rules, i.e., transform
194
195 exp: arith | bool;
196 arith: exp '+' exp;
197 bool: exp '&' exp;
198
199into
200
201 exp: exp '+' exp | exp '&' exp;
202
203when there are no actions. This can significantly speed up some
204grammars.
205
51dec47b
AD
206* Stupid error messages
207An example shows it easily:
208
209src/bison/tests % ./testsuite -k calc,location,error-verbose -l
210GNU Bison 1.49a test suite test groups:
211
212 NUM: FILENAME:LINE TEST-GROUP-NAME
213 KEYWORDS
214
215 51: calc.at:440 Calculator --locations --yyerror-verbose
216 52: calc.at:442 Calculator --defines --locations --name-prefix=calc --verbose --yacc --yyerror-verbose
217 54: calc.at:445 Calculator --debug --defines --locations --name-prefix=calc --verbose --yacc --yyerror-verbose
218src/bison/tests % ./testsuite 51 -d
219## --------------------------- ##
220## GNU Bison 1.49a test suite. ##
221## --------------------------- ##
222 51: calc.at:440 ok
223## ---------------------------- ##
224## All 1 tests were successful. ##
225## ---------------------------- ##
226src/bison/tests % cd ./testsuite.dir/51
227tests/testsuite.dir/51 % echo "()" | ./calc
2281.2-1.3: parse error, unexpected ')', expecting error or "number" or '-' or '('
fa770c86 229
01c56de4
AD
230* yyerror, yyprint interface
231It should be improved, in particular when using Bison features such as
232locations, and YYPARSE_PARAMS. For the time being, it is recommended
233to #define yyerror and yyprint to steal internal variables...
234
fa770c86
AD
235* read_pipe.c
236This is not portable to DOS for instance. Implement a more portable
237scheme. Sources of inspiration include GNU diff, and Free Recode.
238
aef1ffd5
AD
239* Memory leaks in the generator
240A round of memory leak clean ups would be most welcome. Dmalloc,
241Checker GCC, Electric Fence, or Valgrind: you chose your tool.
242
243* Memory leaks in the parser
244The same applies to the generated parsers. In particular, this is
245critical for user data: when aborting a parsing, when handling the
246error token etc., we often throw away yylval without giving a chance
247of cleaning it up to the user.
248
bcb05e75
MA
249* --graph
250Show reductions. []
251
704a47c4 252* Broken options ?
c3995d99 253** %no-lines [ok]
04a76783 254** %no-parser []
fbbf9b3b 255** %pure-parser []
04a76783
MA
256** %semantic-parser []
257** %token-table []
258** Options which could use parse_dquoted_param ().
259Maybe transfered in lex.c.
260*** %skeleton [ok]
261*** %output []
262*** %file-prefix []
263*** %name-prefix []
ec93a213 264
fbbf9b3b 265** Skeleton strategy. []
c3a8cbaa
MA
266Must we keep %no-parser?
267 %token-table?
fbbf9b3b 268*** New skeletons. []
416bd7a9 269
c111e171 270* src/print_graph.c
31b53af2 271Find the best graph parameters. []
63c2d5de
MA
272
273* doc/bison.texinfo
1a4648ff 274** Update
c3a8cbaa 275informations about ERROR_VERBOSE. []
1a4648ff 276** Add explainations about
c3a8cbaa
MA
277skeleton muscles. []
278%skeleton. []
eeeb962b 279
704a47c4 280* testsuite
c3a8cbaa
MA
281** tests/pure-parser.at []
282New tests.
0f8d586a
AD
283
284* Debugging parsers
285
286From Greg McGary:
287
288akim demaille <akim.demaille@epita.fr> writes:
289
290> With great pleasure! Nonetheless, things which are debatable
291> (or not, but just `big') should be discuss in `public': something
292> like help- or bug-bison@gnu.org is just fine. Jesse and I are there,
293> but there is also Jim and some other people.
294
295I have no idea whether it qualifies as big or controversial, so I'll
296just summarize for you. I proposed this change years ago and was
297surprised that it was met with utter indifference!
298
299This debug feature is for the programs/grammars one develops with
300bison, not for debugging bison itself. I find that the YYDEBUG
301output comes in a very inconvenient format for my purposes.
302When debugging gcc, for instance, what I want is to see a trace of
303the sequence of reductions and the line#s for the semantic actions
304so I can follow what's happening. Single-step in gdb doesn't cut it
305because to move from one semantic action to the next takes you through
306lots of internal machinery of the parser, which is uninteresting.
307
308The change I made was to the format of the debug output, so that it
309comes out in the format of C error messages, digestible by emacs
310compile mode, like so:
311
312grammar.y:1234: foo: bar(0x123456) baz(0x345678)
313
314where "foo: bar baz" is the reduction rule, whose semantic action
315appears on line 1234 of the bison grammar file grammar.y. The hex
316numbers on the rhs tokens are the parse-stack values associated with
317those tokens. Of course, yytype might be something totally
318incompatible with that representation, but for the most part, yytype
319values are single words (scalars or pointers). In the case of gcc,
320they're most often pointers to tree nodes. Come to think of it, the
321right thing to do is to make the printing of stack values be
322user-definable. It would also be useful to include the filename &
323line# of the file being parsed, but the main filename & line# should
324continue to be that of grammar.y
325
326Anyway, this feature has saved my life on numerous occasions. The way
327I customarily use it is to first run bison with the traces on, isolate
328the sequence of reductions that interests me, put those traces in a
329buffer and force it into compile-mode, then visit each of those lines
330in the grammar and set breakpoints with C-x SPACE. Then, I can run
331again under the control of gdb and stop at each semantic action.
332With the hex addresses of tree nodes, I can inspect the values
333associated with any rhs token.
334
335You like?
cd6a695e
AD
336
337* input synclines
338Some users create their foo.y files, and equip them with #line. Bison
339should recognize these, and preserve them.
0e95c1dd
AD
340
341* BTYacc
342See if we can integrate backtracking in Bison. Contact the BTYacc
343maintainers.
344
345* Automaton report
346Display more clearly the lookaheads for each item.
347
348* RR conflicts
349See if we can use precedence between rules to solve RR conflicts. See
350what POSIX says.
351
352* Precedence
353It is unfortunate that there is a total order for precedence. It
354makes it impossible to have modular precedence information. We should
355move to partial orders.
356
357* Parsing grammars
358Rewrite the reader in Bison.
f294a2c2 359
20c37f21
AD
360* Problems with aliases
361From: "Baum, Nathan I" <s0009525@chelt.ac.uk>
362Subject: Token Alias Bug
363To: "'bug-bison@gnu.org'" <bug-bison@gnu.org>
364
365I've noticed a bug in bison. Sadly, our eternally wise sysadmins won't let
366us use CVS, so I can't find out if it's been fixed already...
367
368Basically, I made a program (in flex) that went through a .y file looking
369for "..."-tokens, and then outputed a %token
370line for it. For single-character ""-tokens, I reasoned, I could just use
371[%token 'A' "A"]. However, this causes Bison to output a [#define 'A' 65],
372which cppp chokes on, not unreasonably. (And even if cppp didn't choke, I
373obviously wouldn't want (char)'A' to be replaced with (int)65 throughout my
374code.
375
376Bison normally forgoes outputing a #define for a character token. However,
377it always outputs an aliased token -- even if the token is an alias for a
378character token. We don't want that. The problem is in /output.c/, as I
379recall. When it outputs the token definitions, it checks for a character
380token, and then checks for an alias token. If the character token check is
381placed after the alias check, then it works correctly.
382
383Alias tokens seem to be something of a kludge. What about an [%alias "..."]
384command...
385
386 %alias T_IF "IF"
387
388Hmm. I can't help thinking... What about a --generate-lex option that
389creates an .l file for the alias tokens used... (Or an option to make a
390gperf file, etc...)
391
392* Presentation of the report file
393From: "Baum, Nathan I" <s0009525@chelt.ac.uk>
394Subject: Token Alias Bug
395To: "'bug-bison@gnu.org'" <bug-bison@gnu.org>
396
397I've also noticed something, that whilst not *wrong*, is inconvienient: I
398use the verbose mode to help find the causes of unresolved shift/reduce
399conflicts. However, this mode insists on starting the .output file with a
400list of *resolved* conflicts, something I find quite useless. Might it be
401possible to define a -v mode, and a -vv mode -- Where the -vv mode shows
402everything, but the -v mode only tells you what you need for examining
403conflicts? (Or, perhaps, a "*** This state has N conflicts ***" marker above
404each state with conflicts.)
405
406
69991a58
AD
407* $undefined
408From Hans:
409- If the Bison generated parser experiences an undefined number in the
410character range, that character is written out in diagnostic messages, an
411addition to the $undefined value.
412
413Suggest: Change the name $undefined to undefined; looks better in outputs.
414
415* Default Action
416From Hans:
417- For use with my C++ parser, I transported the "switch (yyn)" statement
418that Bison writes to the bison.simple skeleton file. This way, I can remove
419the current default rule $$ = $1 implementation, which causes a double
420assignment to $$ which may not be OK under C++, replacing it with a
421"default:" part within the switch statement.
422
423Note that the default rule $$ = $1, when typed, is perfectly OK under C,
424but in the C++ implementation I made, this rule is different from
425$<type_name>$ = $<type_name>1. I therefore think that one should implement
426a Bison option where every typed default rule is explicitly written out
427(same typed ruled can of course be grouped together).
428
429* Pre and post actions.
430From: Florian Krohm <florian@edamail.fishkill.ibm.com>
431Subject: YYACT_EPILOGUE
432To: bug-bison@gnu.org
433X-Sent: 1 week, 4 days, 14 hours, 38 minutes, 11 seconds ago
434
435The other day I had the need for explicitly building the parse tree. I
436used %locations for that and defined YYLLOC_DEFAULT to call a function
437that returns the tree node for the production. Easy. But I also needed
438to assign the S-attribute to the tree node. That cannot be done in
439YYLLOC_DEFAULT, because it is invoked before the action is executed.
440The way I solved this was to define a macro YYACT_EPILOGUE that would
441be invoked after the action. For reasons of symmetry I also added
442YYACT_PROLOGUE. Although I had no use for that I can envision how it
443might come in handy for debugging purposes.
76551463 444All is needed is to add
69991a58
AD
445
446#if YYLSP_NEEDED
447 YYACT_EPILOGUE (yyval, (yyvsp - yylen), yylen, yyloc, (yylsp - yylen));
448#else
449 YYACT_EPILOGUE (yyval, (yyvsp - yylen), yylen);
450#endif
451
452at the proper place to bison.simple. Ditto for YYACT_PROLOGUE.
453
454I was wondering what you think about adding YYACT_PROLOGUE/EPILOGUE
455to bison. If you're interested, I'll work on a patch.
456
f294a2c2
AD
457-----
458
459Copyright (C) 2001, 2002 Free Software Foundation, Inc.
460
461This file is part of GNU Autoconf.
462
463GNU Autoconf is free software; you can redistribute it and/or modify
464it under the terms of the GNU General Public License as published by
465the Free Software Foundation; either version 2, or (at your option)
466any later version.
467
468GNU Autoconf is distributed in the hope that it will be useful,
469but WITHOUT ANY WARRANTY; without even the implied warranty of
470MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
471GNU General Public License for more details.
472
473You should have received a copy of the GNU General Public License
474along with autoconf; see the file COPYING. If not, write to
475the Free Software Foundation, Inc., 59 Temple Place - Suite 330,
476Boston, MA 02111-1307, USA.