Akim Demaille [Fri, 24 Oct 2008 01:01:48 +0000 (20:01 -0500)]
Support parametric types.
There are two issues to handle: first scanning nested angle bracket pairs
to support types such as std::pair< std::string, std::list<std::string> > >.
Another issue is to address idiosyncracies of C++: do not glue two closing
angle brackets together (otherwise it's operator>>), and avoid sticking
blindly a TYPE to the opening <, as it can result in '<:' which is a
digraph for '['.
* src/scan-gram.l (brace_level): Rename as...
(nesting): this.
(SC_TAG): New.
Implement support for complex tags.
(tag): Accept \n, but not <.
* data/lalr1.cc (b4_symbol_value, b4_symbol_value_template)
(b4_symbol_variant): Leave space around types as parameters.
* examples/variant.yy: Use nested template types and leading ::.
* src/parse-gram.y (TYPE, TYPE_TAG_ANY, TYPE_TAG_NONE, type.opt):
Rename as...
(TAG, TAG_ANY, TAG_NONE, tag.opt): these.
* tests/c++.at: Test parametric types.
Akim Demaille [Fri, 10 Oct 2008 15:04:23 +0000 (17:04 +0200)]
Test token.prefix.
This is not sufficient, but we test at least that the make_SYMBOL
interface is not affected by token.prefix. A more general test
will be implemented when the support of token.prefix is generalized
to more skeletons.
* tests/c++.at: One more variant test, using token.prefix.
* tests/Makefile.am: Rename as...
* tests/local.mk: this.
* Makefile.am, configure.ac: Adjust.
* Makefile.am (DISTCLEANFILES): Define.
(maintainer-check, maintainer-xml-check, maintainer-push-check):
Remove, we no longer need to bounce to the real targets.
It does not work, and I don't know how it was supposed to work: it seems
to be looking for sources in the build tree. I just moved it at a better
place, fixing it is still required.
* src/Makefile.am: Rename as...
* src/local.mk: this.
Prefix all the paths with src/.
(AUTOMAKE_OPTIONS): Build object files in the sub dirs.
(AM_CPPFLAGS): Find find in builddir/src.
(YACC): Move the flags into...
(AM_YFLAGS): here.
* maint.mk (sc_tight_scope): Disable.
It used to bounce to the version in src/Makefile.am which is now
part of this very Makefile.
* Makefile.am, configure.ac: Adjust.
* src/scan-code-c.c, src/scan-code.l: We can no longer rely on
include "..." to find files "here": we are no longer in src/, so
qualify the includes with src/.
* doc/Makefile.am (PREPATH): No longer include the top_builddir
prefix.
(.x.1): Adjust to be able to create src/foo from the top level
Makefile, instead of going bounce to src/Makefile the creation of
foo.
Provide convenience constructors for locations and positions.
* data/location.cc (position::position): Accept file, line and
column as arguments with default values.
Always qualify initial line and column literals as unsigned.
(location::location): Provide convenience constructors.
Instead of using make_symbol<TOK_FOO>, generate make_FOO for each token type.
Using template buys us nothing, and makes it uselessly complex to
construct a symbol. Besides, it could not be generalized to other
languages, while make_FOO would work in C/Java etc.
* data/lalr1.cc (b4_symbol_): New.
(b4_symbol): Use it.
(b4_symbol_constructor_declaration_)
(b4_symbol_constructor_definition_): Instead of generating
specializations of an overloaded template function, just generate
several functions whose names are forged from the token names
without the token.prefix.
(b4_symbol_constructor_declarations): Generate them for all the
symbols, not just by class of symbol type, now that instead of
specializing a function template by the token, we generate a
function named after the token.
(b4_symbol_constructor_specialization_)
(b4_symbol_constructor_specializations): Remove.
* etc/bench.pl.in: Adjust to this new API.
Provide a means to add a prefix to the name of the tokens as output in the
generated files. Because of name clashes, it is good to have such a
prefix such as TOK_ that protects from names such as EOF, FILE etc.
But it clutters the grammar itself.
* data/bison.m4 (token.prefix): Empty by default.
* data/c.m4 (b4_token_enum, b4_token_define): Use it.
* data/lalr1.cc (b4_symbol): Ditto.
This is allows the user to get the type of a token return by
yylex.
* data/lalr1.cc (symbol::token): New.
(yytoknum_): Define when %define lex_symbol, independently of
%debug.
(yytoken_number_): Move into...
(symbol::token): here, since that's the only use.
The other one is YYPRINT which was not officially supported
by lalr1.cc, and anyway it did not work since YYPRINT uses this
array under a different name (yytoknum).
Akim Demaille [Thu, 28 Aug 2008 09:50:09 +0000 (11:50 +0200)]
Define make_symbol in the header.
To reach good performances these functions should be inlined (yet this is
to measure precisely). To this end they must be available to the caller.
* data/lalr1.cc (b4_symbol_constructor_definition_): Qualify
location_type with the class name.
Since will now be output in the header, declare "inline".
No longer use b4_symbol_constructor_specializations, but
b4_symbol_constructor_definitions in the header.
Don't call it in the *.cc file.
Akim Demaille [Thu, 28 Aug 2008 09:50:14 +0000 (11:50 +0200)]
Remove useless class specification.
* data/lalr1.cc (b4_symbol_constructor_specialization_): No need
to refer to the class name to use a type defined by the class for
arguments of member functions.
Akim Demaille [Thu, 28 Aug 2008 08:32:14 +0000 (10:32 +0200)]
Finer input type for yytranslate.
This patch is debatable: the tradition expects yylex to return an int
which happens to correspond to token_number (which is an enum). This
allows for instance to return characters (such as '*' etc.). But this
goes against the stronger typing I am trying to have with the new
lex interface which return a symbol_type. So in this case, feed
yytranslate_ with a token_type.
* data/lalr1.cc (yytranslate_): When in %define lex-symbol,
expect a token_type.
Akim Demaille [Tue, 26 Aug 2008 18:25:58 +0000 (20:25 +0200)]
Use b4_type_names for the union type.
The union used to compute the size of the variant used to iterate over the
type of all the symbols, with a lot of redundancy. Now iterate over the
lists of symbols having the same type-name.
* data/lalr1.cc (b4_char_sizeof_): New.
(b4_char_sizeof): Use it.
Adjust to be called with a list of numbers instead of a single
number.
Adjust its caller for new-line issues.
Akim Demaille [Tue, 26 Aug 2008 18:10:03 +0000 (20:10 +0200)]
Define the "identifier" of a symbol.
Symbols may have several string representations, for instance if they
have an alias. What I call its "id" is a string that can be used as
an identifier. May not exist.
Currently the symbols which have the "tag_is_id" flag set are those that
don't have an alias. Look harder for the id.
* src/output.c (is_identifier): Move to...
* src/symtab.c (is_identifier): here.
* src/symtab.h, src/symtab.c (symbol_id_get): New.
* src/output.c (symbol_definitions_output): Use it to define "id"
and "has_id".
Remove the definition of "tag_is_id".
* data/lalr1.cc: Use the "id" and "has_id" whereever "tag" and
"tag_is_id" were used to produce code.
We still use "tag" for documentation.
Akim Demaille [Mon, 25 Aug 2008 11:52:51 +0000 (13:52 +0200)]
Locations are no longer required by lalr1.cc.
* data/lalr1.cc (_b4_args, b4_args): New.
Adjust all uses of locations to make them optional.
* tests/c++.at (AT_CHECK_VARIANTS): No longer use the locations.
(AT_CHECK_NAMESPACE): Check the use of locations.
* tests/calc.at (_AT_DATA_CALC_Y): Adjust to be usable with or
without locations with lalr1.cc.
Test these cases.
* tests/output.at: Check lalr1.cc with and without location
support.
* tests/regression.at (_AT_DATA_EXPECT2_Y, _AT_DATA_DANCER_Y):
Don't use locations.
Akim Demaille [Thu, 21 Aug 2008 19:46:13 +0000 (21:46 +0200)]
Support i18n of the parse error messages.
* TODO (lalr1.cc/I18n): Remove.
* data/lalr1.cc (yysyntax_error_): Support the translation of the
error messages, as done in yacc.c.
Stay within the yy* pseudo namespace.
Akim Demaille [Tue, 19 Aug 2008 19:39:03 +0000 (21:39 +0200)]
Make it possible to return a symbol_type from yylex.
* data/lalr1.cc (b4_lex_symbol_if): New.
(parse): When lex_symbol is defined, expected yylex to return the
complete lookahead.
* etc/bench.pl.in (generate_grammar_list): Extend to support this
yylex interface.
(bench_variant_parser): Exercise it.
Akim Demaille [Mon, 18 Aug 2008 20:16:40 +0000 (22:16 +0200)]
Let yytranslate handle the eof case.
* data/lalr1.cc (yytranslate_): Handle the EOF case.
Adjust callers.
No longer expect yychar to be equal to yyeof_, rather, test the
lookahead's (translated) kind.
Akim Demaille [Mon, 18 Aug 2008 13:48:36 +0000 (15:48 +0200)]
Introduce make_symbol.
make_symbol provides a means to construct a full symbol (kind, value,
location) in a single shot. It is meant to be a Symbol constructor,
parameterized by the symbol kind so that overloading would prevent
incorrect kind/value pairs. Unfortunately parameterized constructors do
not work well in C++ (unless the parameter also appears as an argument,
which is not acceptable), hence the use of a function instead of a
constructor.
* data/lalr1.cc (b4_symbol_constructor_declaration_)
(b4_symbol_constructor_declarations)
(b4_symbol_constructor_specialization_)
(b4_symbol_constructor_specializations)
(b4_symbol_constructor_definition_)
(b4_symbol_constructor_definitions): New.
Use them where appropriate to generate declaration, declaration of
the specializations, and implementations of the templated
overloaded function "make_symbol".
(variant::variant): Always define a default ctor.
Also provide a copy ctor.
(symbol_base_type, symbol_type): New ctor overloads for value-less
symbols.
(symbol_type): Now public, so that functions such as yylex can use
it.
Akim Demaille [Tue, 11 Nov 2008 13:42:35 +0000 (14:42 +0100)]
Get rid of tabulations in the Java output.
Test 214 was failing: it greps with a pattern containing [ ]* which
obviously meant to catch spaces and tabs, but contained only tabs.
Tabulations in sources are a nuisance, so to simplify the matter, get rid
of all the tabulations in the Java sources. The other skeletons will be
treated equally later.
* data/java.m4, data/lalr1.java: Untabify.
* tests/java.at: Simplify AT_CHECK_JAVA_GREP invocations:
tabulations are no longer generated.
Di-an Jan [Mon, 10 Nov 2008 13:29:07 +0000 (14:29 +0100)]
Various Java skeleton improvements.
* NEWS: Document them.
General Java skeleton improvements.
* configure.ac (gt_JAVACOMP): Request target of 1.4, which allows
using gcj < 4.3 in the testsuite, according to comments in
gnulib/m4/javacomp.m4.
* data/java.m4 (stype, parser_class_name, lex_throws, throws,
location_type, position_type): Remove extraneous brackets from
b4_percent_define_default.
(b4_lex_param, b4_parse_param): Remove extraneous brackets from
m4_define and m4_define_default.
* data/lalr1.java (b4_pre_prologue): Change to b4_user_post_prologue,
which marks the end of user code with appropriate syncline, like all
the other skeletons.
(b4_user_post_prologue): Add. Don't silently drop.
(yylex): Remove.
(parse): Inline yylex.
* doc/bison.texinfo (bisonVersion, bisonSkeleton): Document.
(%{...%}): Fix typo of %code imports.
* tests/java.at (AT_JAVA_COMPILE): Add "java" keyword.
Support annotations on parser class with %define annotations.
* data/lalr1.java (annotations): Add to parser class modifier.
* doc/bison.texinfo (Java Parser Interface): Document
%define annotations.
(Java Declarations Summary): Document %define annotations.
* tests/java.at (Java parser class modifiers): Test annotations.
Do not generate code for %error-verbose unless requested.
* data/lalr1.java (errorVerbose): Rename to yyErrorVerbose.
Make private. Make conditional on %error-verbose.
(getErrorVerbose, setErrorVerbose): New.
(yytnamerr_): Make conditional on %error-verbose.
(yysyntax_error): Make some code conditional on %error-verbose.
* doc/bison.texinfo (Java Bison Interface): Remove the parts
about %error-verbose having no effect.
(getErrorVerbose, setErrorVerbose): Document.
Move constants for token names to Lexer interface.
* data/lalr1.java (Lexer): Move EOF, b4_token_enums(b4_tokens) here.
* data/java.m4 (b4_token_enum): Indent for move to Lexer interface.
(parse): Qualify EOF to Lexer.EOF.
* doc/bison.texinfo (Java Parser Interface): Move documentation of
EOF and token names to Java Lexer Interface.
* tests/java.at (_AT_DATA_JAVA_CALC_Y): Remove Calc qualifier.
Make yyerror public.
* data/lalr1.java (Lexer.yyerror): Use longer parameter name.
(yyerror): Change to public. Add Javadoc comments. Use longer
parameter names. Make the body rather than the declarator
conditional on %locations.
* doc/bison.texinfo (yyerror): Document. Don't mark as protected.
Allow user to add code to the constructor with %code init.
* data/java.m4 (b4_init_throws): New, for %define init_throws.
* data/lalr1.java (YYParser.YYParser): Add b4_init_throws.
Add %code init to the front of the constructor body.
* doc/bison.texinfo (YYParser.YYParser): Document %code init
and %define init_throws.
(Java Declarations Summary): Document %code init and
%define init_throws.
* tests/java.at (Java %parse-param and %lex-param): Adjust grep.
(Java constructor init and init_throws): Add tests.
Akim Demaille [Mon, 18 Aug 2008 12:44:05 +0000 (14:44 +0200)]
Update TODO.
* TODO (-D): is implemented.
(associativity): Same precedence must have the same associativity.
For instance, how can a * b / c be parsed if * is %left and / is
%right?
(YYERRORCODE, YYFAIL, YYBACKUP): New.
Akim Demaille [Sat, 16 Aug 2008 18:32:37 +0000 (20:32 +0200)]
More information about the symbols.
* src/output.c (type_names_output): Document all the symbols,
including those that don't have a type-name.
(symbol_definitions_output): Define "is_token" and
"has_type_name".
* data/lalr1.cc (b4_type_action_): Skip symbols that have an empty
type-name, now that they are defined too in b4_type_names.
Akim Demaille [Sat, 16 Aug 2008 13:29:30 +0000 (15:29 +0200)]
Avoid trailing spaces.
* data/c.m4: b4_comment(TEXT): Don't indent empty lines.
* data/lalr1.cc: Don't indent before rule and symbol actions, as
they can be empty, and anyway this incorrectly indents the first
action.
Akim Demaille [Wed, 13 Aug 2008 11:18:22 +0000 (13:18 +0200)]
Adjust verbose message to using emacs.
* etc/bench.pl.in: Inform compilation-mode when we change the
directory.
(generate_grammar_list): Recognize %define "variant" in addition
to %define variant.
Akim Demaille [Tue, 12 Aug 2008 19:48:53 +0000 (21:48 +0200)]
Change the handling of the symbols in the skeletons.
Before we were using tables which lines were the symbols and which
columns were things like number, tag, type-name etc. It is was
difficult to extend: each time a column was added, all the numbers had
to be updated (you asked for colon $2, not for "tag"). Also, it was
hard to filter these tables when only a subset of the symbols (say the
tokens, or the nterms, or the tokens that have and external number
*and* a type-name) was of interest.
Now instead of monolithic tables, we define one macro per cell. For
instance "b4_symbol(0, tag)" is a macro name which contents is
self-decriptive. The macro "b4_symbol" provides easier access to
these cells.
Akim Demaille [Thu, 7 Aug 2008 21:15:34 +0000 (23:15 +0200)]
Add %precedence support.
Unfortunately it is not possible to reuse the %prec directive. This
is because to please POSIX, we do not require to end the rules with a
semicolon. As a result,
foo: bar %prec baz
is ambiguous: either a rule which precedence is that of baz, or a rule,
and then a declaration of the precedence of the token baz.
* doc/bison.texinfo: Document %precedence.
(Precedence Only): New.
* src/assoc.h, src/assoc.c (precedence_assoc): New.
* src/conflicts.c (resolve_sr_conflict): Support it.
* src/scan-gram.l, src/parse-gram.y (%precedence): New token.
Parse it.
* tests/calc.at: Use %precedence for NEG.
* tests/conflicts.at (%precedence does not suffice)
(%precedence suffices): New tests.
Akim Demaille [Sat, 2 Aug 2008 20:06:49 +0000 (22:06 +0200)]
bench.pl -d, --directive.
* etc/bench.pl.in (@directive): New.
(&bench_grammar): Use it.
(&bench_list_grammar): New, to provide access to the "variant"
grammar.
Use it.
(getopts): Support -d, --directive.