* src/scan-gram.l (SC_PROLOGUE): Don't eat characters amongst

[bison.git] / TODO
diff --git a/TODO b/TODO

index 7514df807f3db17fcc7086965a4852ad49947f64..d37250321a8609dc6aa4b4627202902cf2551b04 100644 (file)
--- a/TODO
+++ b/TODO
@@ -1,5 +1,41 @@
  -*- outline -*-
  
+* URGENT: Documenting C++ output
+Write a first documentation for C++ output.
+
+* value_components_used
+Was defined but not used: where was it coming from?  It can't be to
+check if %union is used, since the user is free to $<foo>n on her
+union, doesn't she?
+
+* yyerror, yyprint interface
+It should be improved, in particular when using Bison features such as
+locations, and YYPARSE_PARAMS.  For the time being, it is recommended
+to #define yyerror and yyprint to steal internal variables...
+
+* documentation
+Explain $axiom (and maybe change its name: BTYacc names it `goal',
+byacc `$accept' probably based on AT&T Yacc, Meta `Start'...).
+Complete the glossary (item, axiom, ?).
+
+* Error messages
+Some are really funky.  For instance
+
+       type clash (`%s' `%s') on default action
+
+is really weird.  Revisit them all.
+
+* Report documentation
+Extend with error.  The hard part will probably be finding the right
+rule so that a single state does not exhibit to many yet undocumented
+``features''.  Maybe an empty action ought to be presented too.  Shall
+we try to make a single grammar with all these features, or should we
+have several very small grammars?
+
+* Documentation
+Some history of Bison and some bibliography would be most welcome.
+Are there any Texinfo standards for bibliography?
+
  * Several %unions
  I think this is a pleasant (but useless currently) feature, but in the
  future, I want a means to %include other bits of grammars, and _then_
@@ -21,30 +57,10 @@ When implementing multiple-%union support, bare the following in mind:
           char *sval;
         }
  
-* Language independent actions
-
-Currently bison, the generator, transforms $1, $$ and so forth into
-direct C code, manipulating the stacks.  This is problematic, because
-(i) it means that if we want more languages, we need to update the
-generator, and (ii), it forces names everywhere (e.g., the C++
-skeleton would be happy to use other naming schemes, and actually,
-even other accessing schemes).
-
-Therefore we want
-
-1. the generator to replace $1, etc. by M4 macro invocations
-   (b4_dollar(1), b4_at(3), b4_dollar_dollar) etc.
-
-2. the skeletons to define these macros.
-
-But currently the actions are double-quoted, to protect them from M4
-evaluation.  So we need to:
-
-3. stop quoting them
-
-4. change the [ and ] in the actions into @<:@ and @:>@
-
-5. extend the postprocessor to maps these back onto [ and ].
+* --report=conflict-path
+Provide better assistance for understanding the conflicts by providing
+a sample text exhibiting the (LALR) ambiguity.  See the paper from
+DeRemer and Penello: they already provide the algorithm.
  
  * Coding system independence
  Paul notes:
@@ -59,26 +75,6 @@ Paul notes:
         PDP-10 ports :-) but they should probably be documented
         somewhere.
  
-* Using enums instead of int for tokens.
-Paul suggests:
-
-   #ifndef YYTOKENTYPE
-   # if defined (__STDC__) || defined (__cplusplus)
-      /* Put the tokens into the symbol table, so that GDB and other debuggers
-         know about them.  */
-      enum yytokentype {
-        FOO = 256,
-        BAR,
-        ...
-      };
-      /* POSIX requires `int' for tokens in interfaces.  */
-   #  define YYTOKENTYPE int
-   # endif
-   #endif
-   #define FOO 256
-   #define BAR 257
-   ...
-
  * Output directory
  Akim:
  
@@ -162,7 +158,10 @@ into
         exp: exp '+' exp | exp '&' exp;
  
  when there are no actions.  This can significantly speed up some
-grammars.
+grammars.  I can't find the papers.  In particular the book `LR
+parsing: Theory and Practice' is impossible to find, but according to
+`Parsing Techniques: a Practical Guide', it includes information about
+this issue.  Does anybody have it?
  
  * Stupid error messages
  An example shows it easily:
@@ -188,11 +187,6 @@ src/bison/tests % cd ./testsuite.dir/51
  tests/testsuite.dir/51 % echo "()" | ./calc
  1.2-1.3: parse error, unexpected ')', expecting error or "number" or '-' or '('
  
-* yyerror, yyprint interface
-It should be improved, in particular when using Bison features such as
-locations, and YYPARSE_PARAMS.  For the time being, it is recommended
-to #define yyerror and yyprint to steal internal variables...
-
  * read_pipe.c
  This is not portable to DOS for instance.  Implement a more portable
  scheme.  Sources of inspiration include GNU diff, and Free Recode.
@@ -302,9 +296,6 @@ should recognize these, and preserve them.
  See if we can integrate backtracking in Bison.  Contact the BTYacc
  maintainers.
  
-* Automaton report
-Display more clearly the lookaheads for each item.
-
  * RR conflicts
  See if we can use precedence between rules to solve RR conflicts.  See
  what POSIX says.
@@ -314,55 +305,16 @@ It is unfortunate that there is a total order for precedence.  It
  makes it impossible to have modular precedence information.  We should
  move to partial orders.
  
+This will be possible with a Bison parser for the grammar, as it will
+make it much easier to extend the grammar.
+
  * Parsing grammars
-Rewrite the reader in Bison.
-
-* Problems with aliases
-From: "Baum, Nathan I" <s0009525@chelt.ac.uk>
-Subject: Token Alias Bug
-To: "'bug-bison@gnu.org'" <bug-bison@gnu.org>
-
-I've noticed a bug in bison. Sadly, our eternally wise sysadmins won't let
-us use CVS, so I can't find out if it's been fixed already...
-
-Basically, I made a program (in flex) that went through a .y file looking
-for "..."-tokens, and then outputed a %token
-line for it. For single-character ""-tokens, I reasoned, I could just use
-[%token 'A' "A"]. However, this causes Bison to output a [#define 'A' 65],
-which cppp chokes on, not unreasonably. (And even if cppp didn't choke, I
-obviously wouldn't want (char)'A' to be replaced with (int)65 throughout my
-code.
-
-Bison normally forgoes outputing a #define for a character token. However,
-it always outputs an aliased token -- even if the token is an alias for a
-character token. We don't want that. The problem is in /output.c/, as I
-recall. When it outputs the token definitions, it checks for a character
-token, and then checks for an alias token. If the character token check is
-placed after the alias check, then it works correctly.
-
-Alias tokens seem to be something of a kludge. What about an [%alias "..."]
-command...
-
-       %alias T_IF "IF"
-
-Hmm. I can't help thinking... What about a --generate-lex option that
-creates an .l file for the alias tokens used... (Or an option to make a
-gperf file, etc...)
-
-* Presentation of the report file
-From: "Baum, Nathan I" <s0009525@chelt.ac.uk>
-Subject: Token Alias Bug
-To: "'bug-bison@gnu.org'" <bug-bison@gnu.org>
-
-I've also noticed something, that whilst not *wrong*, is inconvienient: I
-use the verbose mode to help find the causes of unresolved shift/reduce
-conflicts. However, this mode insists on starting the .output file with a
-list of *resolved* conflicts, something I find quite useless. Might it be
-possible to define a -v mode, and a -vv mode -- Where the -vv mode shows
-everything, but the -v mode only tells you what you need for examining
-conflicts? (Or, perhaps, a "*** This state has N conflicts ***" marker above
-each state with conflicts.)
+Rewrite the reader in Flex/Bison.  There will be delicate parts, in
+particular, expect the scanner to be hard to write.  Many interesting
+features cannot be implemented without such a new reader.
  
+I'm on it!  I already have a proto that parses (but the actions are
+not fully written yet).  -- Akim
  
  * $undefined
  From Hans:
@@ -386,6 +338,18 @@ $<type_name>$ = $<type_name>1. I therefore think that one should implement
  a Bison option where every typed default rule is explicitly written out
  (same typed ruled can of course be grouped together).
  
+Note: Robert Anisko handles this.  He knows how to do it.
+
+* Warnings
+It would be nice to have warning support.  See how Autoconf handles
+them, it is fairly well described there.  It would be very nice to
+implement this in such a way that other programs could use
+lib/warnings.[ch].
+
+Don't work on this without first announcing you do, as I already have
+thought about it, and know many of the components that can be used to
+implement it.
+
  * Pre and post actions.
  From: Florian Krohm <florian@edamail.fishkill.ibm.com>
  Subject: YYACT_EPILOGUE
@@ -414,6 +378,10 @@ at the proper place to bison.simple. Ditto for YYACT_PROLOGUE.
  I was wondering what you think about adding YYACT_PROLOGUE/EPILOGUE
  to bison. If you're interested, I'll work on a patch.
  
+* Move to Graphviz
+Well, VCG seems really dead.  Move to Graphviz instead.  Also, equip
+the parser with a means to create the (visual) parse tree.
+
  -----
  
  Copyright (C) 2001, 2002 Free Software Foundation, Inc.