Provide a means to add a prefix to the name of the tokens as output in the
generated files. Because of name clashes, it is good to have such a
prefix such as TOK_ that protects from names such as EOF, FILE etc.
But it clutters the grammar itself.
* data/bison.m4 (token.prefix): Empty by default.
* data/c.m4 (b4_token_enum, b4_token_define): Use it.
* data/lalr1.cc (b4_symbol): Ditto.
This is allows the user to get the type of a token return by
yylex.
* data/lalr1.cc (symbol::token): New.
(yytoknum_): Define when %define lex_symbol, independently of
%debug.
(yytoken_number_): Move into...
(symbol::token): here, since that's the only use.
The other one is YYPRINT which was not officially supported
by lalr1.cc, and anyway it did not work since YYPRINT uses this
array under a different name (yytoknum).
Akim Demaille [Thu, 28 Aug 2008 09:50:09 +0000 (11:50 +0200)]
Define make_symbol in the header.
To reach good performances these functions should be inlined (yet this is
to measure precisely). To this end they must be available to the caller.
* data/lalr1.cc (b4_symbol_constructor_definition_): Qualify
location_type with the class name.
Since will now be output in the header, declare "inline".
No longer use b4_symbol_constructor_specializations, but
b4_symbol_constructor_definitions in the header.
Don't call it in the *.cc file.
Akim Demaille [Thu, 28 Aug 2008 09:50:14 +0000 (11:50 +0200)]
Remove useless class specification.
* data/lalr1.cc (b4_symbol_constructor_specialization_): No need
to refer to the class name to use a type defined by the class for
arguments of member functions.
Akim Demaille [Thu, 28 Aug 2008 08:32:14 +0000 (10:32 +0200)]
Finer input type for yytranslate.
This patch is debatable: the tradition expects yylex to return an int
which happens to correspond to token_number (which is an enum). This
allows for instance to return characters (such as '*' etc.). But this
goes against the stronger typing I am trying to have with the new
lex interface which return a symbol_type. So in this case, feed
yytranslate_ with a token_type.
* data/lalr1.cc (yytranslate_): When in %define lex-symbol,
expect a token_type.
Akim Demaille [Tue, 26 Aug 2008 18:25:58 +0000 (20:25 +0200)]
Use b4_type_names for the union type.
The union used to compute the size of the variant used to iterate over the
type of all the symbols, with a lot of redundancy. Now iterate over the
lists of symbols having the same type-name.
* data/lalr1.cc (b4_char_sizeof_): New.
(b4_char_sizeof): Use it.
Adjust to be called with a list of numbers instead of a single
number.
Adjust its caller for new-line issues.
Akim Demaille [Tue, 26 Aug 2008 18:10:03 +0000 (20:10 +0200)]
Define the "identifier" of a symbol.
Symbols may have several string representations, for instance if they
have an alias. What I call its "id" is a string that can be used as
an identifier. May not exist.
Currently the symbols which have the "tag_is_id" flag set are those that
don't have an alias. Look harder for the id.
* src/output.c (is_identifier): Move to...
* src/symtab.c (is_identifier): here.
* src/symtab.h, src/symtab.c (symbol_id_get): New.
* src/output.c (symbol_definitions_output): Use it to define "id"
and "has_id".
Remove the definition of "tag_is_id".
* data/lalr1.cc: Use the "id" and "has_id" whereever "tag" and
"tag_is_id" were used to produce code.
We still use "tag" for documentation.
Akim Demaille [Mon, 25 Aug 2008 11:52:51 +0000 (13:52 +0200)]
Locations are no longer required by lalr1.cc.
* data/lalr1.cc (_b4_args, b4_args): New.
Adjust all uses of locations to make them optional.
* tests/c++.at (AT_CHECK_VARIANTS): No longer use the locations.
(AT_CHECK_NAMESPACE): Check the use of locations.
* tests/calc.at (_AT_DATA_CALC_Y): Adjust to be usable with or
without locations with lalr1.cc.
Test these cases.
* tests/output.at: Check lalr1.cc with and without location
support.
* tests/regression.at (_AT_DATA_EXPECT2_Y, _AT_DATA_DANCER_Y):
Don't use locations.
Akim Demaille [Thu, 21 Aug 2008 19:46:13 +0000 (21:46 +0200)]
Support i18n of the parse error messages.
* TODO (lalr1.cc/I18n): Remove.
* data/lalr1.cc (yysyntax_error_): Support the translation of the
error messages, as done in yacc.c.
Stay within the yy* pseudo namespace.
Akim Demaille [Tue, 19 Aug 2008 19:39:03 +0000 (21:39 +0200)]
Make it possible to return a symbol_type from yylex.
* data/lalr1.cc (b4_lex_symbol_if): New.
(parse): When lex_symbol is defined, expected yylex to return the
complete lookahead.
* etc/bench.pl.in (generate_grammar_list): Extend to support this
yylex interface.
(bench_variant_parser): Exercise it.
Akim Demaille [Mon, 18 Aug 2008 20:16:40 +0000 (22:16 +0200)]
Let yytranslate handle the eof case.
* data/lalr1.cc (yytranslate_): Handle the EOF case.
Adjust callers.
No longer expect yychar to be equal to yyeof_, rather, test the
lookahead's (translated) kind.
Akim Demaille [Mon, 18 Aug 2008 13:48:36 +0000 (15:48 +0200)]
Introduce make_symbol.
make_symbol provides a means to construct a full symbol (kind, value,
location) in a single shot. It is meant to be a Symbol constructor,
parameterized by the symbol kind so that overloading would prevent
incorrect kind/value pairs. Unfortunately parameterized constructors do
not work well in C++ (unless the parameter also appears as an argument,
which is not acceptable), hence the use of a function instead of a
constructor.
* data/lalr1.cc (b4_symbol_constructor_declaration_)
(b4_symbol_constructor_declarations)
(b4_symbol_constructor_specialization_)
(b4_symbol_constructor_specializations)
(b4_symbol_constructor_definition_)
(b4_symbol_constructor_definitions): New.
Use them where appropriate to generate declaration, declaration of
the specializations, and implementations of the templated
overloaded function "make_symbol".
(variant::variant): Always define a default ctor.
Also provide a copy ctor.
(symbol_base_type, symbol_type): New ctor overloads for value-less
symbols.
(symbol_type): Now public, so that functions such as yylex can use
it.
Akim Demaille [Tue, 11 Nov 2008 13:42:35 +0000 (14:42 +0100)]
Get rid of tabulations in the Java output.
Test 214 was failing: it greps with a pattern containing [ ]* which
obviously meant to catch spaces and tabs, but contained only tabs.
Tabulations in sources are a nuisance, so to simplify the matter, get rid
of all the tabulations in the Java sources. The other skeletons will be
treated equally later.
* data/java.m4, data/lalr1.java: Untabify.
* tests/java.at: Simplify AT_CHECK_JAVA_GREP invocations:
tabulations are no longer generated.
Di-an Jan [Mon, 10 Nov 2008 13:29:07 +0000 (14:29 +0100)]
Various Java skeleton improvements.
* NEWS: Document them.
General Java skeleton improvements.
* configure.ac (gt_JAVACOMP): Request target of 1.4, which allows
using gcj < 4.3 in the testsuite, according to comments in
gnulib/m4/javacomp.m4.
* data/java.m4 (stype, parser_class_name, lex_throws, throws,
location_type, position_type): Remove extraneous brackets from
b4_percent_define_default.
(b4_lex_param, b4_parse_param): Remove extraneous brackets from
m4_define and m4_define_default.
* data/lalr1.java (b4_pre_prologue): Change to b4_user_post_prologue,
which marks the end of user code with appropriate syncline, like all
the other skeletons.
(b4_user_post_prologue): Add. Don't silently drop.
(yylex): Remove.
(parse): Inline yylex.
* doc/bison.texinfo (bisonVersion, bisonSkeleton): Document.
(%{...%}): Fix typo of %code imports.
* tests/java.at (AT_JAVA_COMPILE): Add "java" keyword.
Support annotations on parser class with %define annotations.
* data/lalr1.java (annotations): Add to parser class modifier.
* doc/bison.texinfo (Java Parser Interface): Document
%define annotations.
(Java Declarations Summary): Document %define annotations.
* tests/java.at (Java parser class modifiers): Test annotations.
Do not generate code for %error-verbose unless requested.
* data/lalr1.java (errorVerbose): Rename to yyErrorVerbose.
Make private. Make conditional on %error-verbose.
(getErrorVerbose, setErrorVerbose): New.
(yytnamerr_): Make conditional on %error-verbose.
(yysyntax_error): Make some code conditional on %error-verbose.
* doc/bison.texinfo (Java Bison Interface): Remove the parts
about %error-verbose having no effect.
(getErrorVerbose, setErrorVerbose): Document.
Move constants for token names to Lexer interface.
* data/lalr1.java (Lexer): Move EOF, b4_token_enums(b4_tokens) here.
* data/java.m4 (b4_token_enum): Indent for move to Lexer interface.
(parse): Qualify EOF to Lexer.EOF.
* doc/bison.texinfo (Java Parser Interface): Move documentation of
EOF and token names to Java Lexer Interface.
* tests/java.at (_AT_DATA_JAVA_CALC_Y): Remove Calc qualifier.
Make yyerror public.
* data/lalr1.java (Lexer.yyerror): Use longer parameter name.
(yyerror): Change to public. Add Javadoc comments. Use longer
parameter names. Make the body rather than the declarator
conditional on %locations.
* doc/bison.texinfo (yyerror): Document. Don't mark as protected.
Allow user to add code to the constructor with %code init.
* data/java.m4 (b4_init_throws): New, for %define init_throws.
* data/lalr1.java (YYParser.YYParser): Add b4_init_throws.
Add %code init to the front of the constructor body.
* doc/bison.texinfo (YYParser.YYParser): Document %code init
and %define init_throws.
(Java Declarations Summary): Document %code init and
%define init_throws.
* tests/java.at (Java %parse-param and %lex-param): Adjust grep.
(Java constructor init and init_throws): Add tests.
Akim Demaille [Mon, 18 Aug 2008 12:44:05 +0000 (14:44 +0200)]
Update TODO.
* TODO (-D): is implemented.
(associativity): Same precedence must have the same associativity.
For instance, how can a * b / c be parsed if * is %left and / is
%right?
(YYERRORCODE, YYFAIL, YYBACKUP): New.
Akim Demaille [Sat, 16 Aug 2008 18:32:37 +0000 (20:32 +0200)]
More information about the symbols.
* src/output.c (type_names_output): Document all the symbols,
including those that don't have a type-name.
(symbol_definitions_output): Define "is_token" and
"has_type_name".
* data/lalr1.cc (b4_type_action_): Skip symbols that have an empty
type-name, now that they are defined too in b4_type_names.
Akim Demaille [Sat, 16 Aug 2008 13:29:30 +0000 (15:29 +0200)]
Avoid trailing spaces.
* data/c.m4: b4_comment(TEXT): Don't indent empty lines.
* data/lalr1.cc: Don't indent before rule and symbol actions, as
they can be empty, and anyway this incorrectly indents the first
action.
Akim Demaille [Wed, 13 Aug 2008 11:18:22 +0000 (13:18 +0200)]
Adjust verbose message to using emacs.
* etc/bench.pl.in: Inform compilation-mode when we change the
directory.
(generate_grammar_list): Recognize %define "variant" in addition
to %define variant.
Akim Demaille [Tue, 12 Aug 2008 19:48:53 +0000 (21:48 +0200)]
Change the handling of the symbols in the skeletons.
Before we were using tables which lines were the symbols and which
columns were things like number, tag, type-name etc. It is was
difficult to extend: each time a column was added, all the numbers had
to be updated (you asked for colon $2, not for "tag"). Also, it was
hard to filter these tables when only a subset of the symbols (say the
tokens, or the nterms, or the tokens that have and external number
*and* a type-name) was of interest.
Now instead of monolithic tables, we define one macro per cell. For
instance "b4_symbol(0, tag)" is a macro name which contents is
self-decriptive. The macro "b4_symbol" provides easier access to
these cells.
Akim Demaille [Thu, 7 Aug 2008 21:15:34 +0000 (23:15 +0200)]
Add %precedence support.
Unfortunately it is not possible to reuse the %prec directive. This
is because to please POSIX, we do not require to end the rules with a
semicolon. As a result,
foo: bar %prec baz
is ambiguous: either a rule which precedence is that of baz, or a rule,
and then a declaration of the precedence of the token baz.
* doc/bison.texinfo: Document %precedence.
(Precedence Only): New.
* src/assoc.h, src/assoc.c (precedence_assoc): New.
* src/conflicts.c (resolve_sr_conflict): Support it.
* src/scan-gram.l, src/parse-gram.y (%precedence): New token.
Parse it.
* tests/calc.at: Use %precedence for NEG.
* tests/conflicts.at (%precedence does not suffice)
(%precedence suffices): New tests.
Akim Demaille [Sat, 2 Aug 2008 20:06:49 +0000 (22:06 +0200)]
bench.pl -d, --directive.
* etc/bench.pl.in (@directive): New.
(&bench_grammar): Use it.
(&bench_list_grammar): New, to provide access to the "variant"
grammar.
Use it.
(getopts): Support -d, --directive.
Akim Demaille [Sat, 2 Aug 2008 19:42:48 +0000 (21:42 +0200)]
Introduce a hierarchy for symbols.
* data/lalr1.cc (symbol_base_type, symbol_type): New.
(data_type): Rename as...
(stack_symbol_type): this.
Derive from symbol_base_type.
(yy_symbol_value_print_): Merge into...
(yy_symbol_print_): this.
Rename as...
(yy_print_): this.
(yydestruct_): Rename as...
(yy_destroy_): this.
(b4_symbols_actions, YY_SYMBOL_PRINT): Adjust.
(parser::parse): yyla is now of symbol_type.
Use its type member instead of yytoken.
Akim Demaille [Sat, 2 Aug 2008 12:18:48 +0000 (14:18 +0200)]
Handle semantic value and location together.
* data/lalr1.cc (b4_symbol_actions): Bounce $$ and @$ to
yydata.value and yydata.location.
(yy_symbol_value_print_, yy_symbol_print_, yydestruct_)
(YY_SYMBOL_PRINT): Now take semantic value and location as a
single arg.
Adjust all callers.
(yydestruct_): New overload for a stack symbol.
Rely on the state stack to display reduction traces.
To display rhs symbols before a reduction, we used information about the rule
reduced, which required the tables yyrhs and yyprhs. Now use rely only on the
state stack to get the same information.
* data/lalr1.cc (b4_rhs_data, b4_rhs_state): New.
Use them.
(parser::yyrhs_, parser::yyprhs_): Remove.
(parser::yy_reduce_print_): Use the state stack.
* data/lalr1.cc (b4_lhs_value, b4_lhs_location): Adjust to using
yylhs.
(parse): Replace yyval and yyloc with yylhs.value and
yylhs.location.
After a user action, compute yylhs.state earlier.
(yyerrlab1): Do not play tricks with yylhs.location, rather, use a
fresh error_token.
Joel E. Denny [Fri, 7 Nov 2008 22:20:44 +0000 (17:20 -0500)]
Clean up %skeleton and %language priority implementation.
* src/getargs.c (skeleton_prio): Use default_prio rather than 2, and
remove static qualifier because others will soon need to see it.
(language_prio): Likewise.
(getargs): Use command_line_prio rather than 0.
* src/getargs.h (command_line_prio, grammar_prio, default_prio): New
enum fields.
(skeleton_prio): Extern it.
(language_prio): Extern it.
* src/parse-gram.y: Use grammar_prio rather than 1.
Pass command line location to skeleton_arg and language_argmatch.
* src/getargs.h, src/getargs.c (skeleton_arg, language_argmatch):
The location argument is now mandatory.
Adjust all dependencies.
(getargs): Use command_line_location.
Initialize the muscle table before parsing the command line.
* src/getargs.c (quotearg.h, muscle_tab.h): Include.
(getargs): Define file_name.
* src/main.c (main): Initialize muscle_tab before calling
getargs.
* src/muscle_tab.c (muscle_init): No longer define file_name, as
its value is not available yet.
* build-aux/cross-options.pl: The argument ends at the first
space, not the first non-symbol character.
Use @var for each word appearing the argument description.
Destroy the variants that remain on the stack in case of error.
* data/lalr1-fusion.cc (yydestruct_): Invoke the variant's
destructor.
Display the value only if yymsg is nonnull.
(yyreduce): Invoke yydestruct_ when popping lhs symbols.
This is used to help the user catch cases where some value gets
ovewritten by a new one. This should not happen, as this will
probably leak.
Unfortunately this uncovered a bug in the C++ parser itself: the
lookahead value was not destroyed between two calls to yylex. For
instance if the previous lookahead was a std::string, and then an int,
then the value of the std::string was correctly taken (i.e., the
lookahead was now an empty string), but std::string structure itself
was not reclaimed.
This is now done in variant::build(other&) (which is used to take the
value of the lookahead): other is not only stolen from its value, it
is also destroyed. This incurs a new performance penalty of a few
percent, and union becomes faster again.
* data/lalr1-fusion.cc (variant::build(other&)): Destroy other.
(b4_variant_if): New.
(variant::built): New.
Use it whereever the status of the variant changes.
* etc/bench.pl.in: Check the penalty of %define assert.
Joel E. Denny [Tue, 4 Nov 2008 20:03:00 +0000 (15:03 -0500)]
Fix user actions without a trailing semicolon.
Reported by Sergei Steshenko at
<http://lists.gnu.org/archive/html/bug-bison/2008-11/msg00001.html>.
* THANKS (Sergei Steshenko): Add.
* src/scan-code.l (SC_RULE_ACTION): Fix it.
* tests/regression.at (Fix user actions without a trailing semicolon):
New test case.