i’m implementing an IDE for scheme in eclipse using DLTK. So far, i am programming the grammar to recognize the lexical structure.
i’m following the official EBNF which can be viewed here:
http://rose-r5rs.googlecode.com/hg/doc/r5rs-grammar.html
i can’t get a simple form of the numbers grammar getting worked. for example the decimal numbers, i have
grammar r5rsnumbers;
options {
language = Java;
}
program:
NUMBER;
// NUMBERS
NUMBER : /*NUM_2 | NUM_8 |*/ NUM_10; //| NUM_16;
fragment NUM_10 : PREFIX_10 COMPLEX_10;
fragment COMPLEX_10
: REAL_10 (
'@' REAL_10
| '+' (
UREAL_10 'i'
| 'i'
)?
| '-' (
UREAL_10 'i'
| 'i'
)?
)?
| '+' (
UREAL_10 'i'
| 'i'
)?
| '-' (
UREAL_10 'i'
| 'i'
)?;
fragment REAL_10 : SIGN UREAL_10;
fragment UREAL_10
: UINTEGER_10 ('/' UINTEGER_10)?
| DECIMAL_10;
fragment UINTEGER_10 : DIGIT_10+ '#'*;
fragment DECIMAL_10
: UINTEGER_10 SUFFIX
| '.' DIGIT_10+ '#'* SUFFIX
| DIGIT_10+ '.' DIGIT_10* '#'* SUFFIX
| DIGIT_10+ '#'+ '.' '#'* SUFFIX;
fragment PREFIX_10
: RADIX_10 EXACTNESS
| EXACTNESS RADIX_10;
fragment DIGIT : '0'..'9';
fragment EMPTY : '""'; // empty is the empty string
fragment SUFFIX : EMPTY | EXPONENT_MARKER SIGN DIGIT_10+;
fragment EXPONENT_MARKER : 'e' | 's' | 'f' | 'd' | 'l';
fragment SIGN : EMPTY | '+' | '-';
fragment EXACTNESS : EMPTY | '#i' | '#e';
fragment RADIX_10 : EMPTY | '#d';
fragment DIGIT_10 : DIGIT;
the problem is, it is not recognizing anything. i don’t understand the warning i get from the PREFIX_10 or how to solve it. if i don’t use fragment in the rules, the file isn’t compiling since he complains about the DIGIT_10 rule matching the same input as almost all other prior rules.
it’s the same with num_2, num_8 and num_16
plus, i am not sure with my solution of the empty-string.
how do i get around here?
Note that your ANTLR rule:
does not match an empty string, but two double quotes.
But you don’t want a lexer rule to match only an empty string: that will cause it to go in an infinite loop since there are an infinite amount of empty strings in any string/source.
So the BNF rules:
should not be translated as the following ANTLR rules:
but like this instead:
Also note that your rule:
is a bit hard to read. Indenting it differently might make this a bit easier to comprehend:
which could be simplified by writing:
Also be aware that many BNF notations make no distinction between lower- and uppercase literals. So instead of writing
'i'in your ANTLR grammar, you might want to use('i' | 'I')instead.EDIT
There are a couple of things wrong with the (fragment) rule
PREFIX_10:For one, both match an empty string. Because alternative 1 will always match an empty string, alternative 2 would never match, which is what ANTLR was telling you.
Now, looking at the BNF rules:
(Note that
<empty> {#d}equals{#d}, so the<empty>is IMO just misplaced. All other radii don’t have and<empty>part)I’d translate those into the following (untested!) ANTLR rules:
** Note that it’s not:
because the lexer does not know through which alternative to match
#d.And in case the BNF rule for
<radix 10>should be like this (ie. they forgot to place a|):then the ANTLR
PREFIX_10should still look like:but then all other rules that use
PREFIX_10should makePREFIX_10optional.HTH