Some keywords (string constant) in my grammar contain capital letters
e.g.
PREV_VALUE : 'PreviousValue';
This causes strange parsing behavior: other tokens that contain same capital letters (‘P’,’V’) are parsed incorrectly.
Here’s a simplified version of the lexer grammar:
lexer grammar ExpressionLexer;
COMMA : ',';
LPAREN : '(';
RPAREN : ')';
LBRACK : '[';
RBRACK : ']';
PLUS : '+';
MINUS : '-';
MULT : '*';
DIV : '/';
PREV_VALUE : 'PreviousValue';
fragment DIGIT : ('0'..'9');
fragment LETTER : ('a'..'z'|'A'..'Z'|'_');
fragment TAB : ('\t') ;
fragment NEWLINE : ('\r'|'\n') ;
fragment SPACE : (' ') ;
When I try parsing such expression:
var expression = "P"; //Capital 'P' which included to the keyword 'PreviousValue'
var stringReader = new StringReader(expression);
var input = new ANTLRReaderStream(stringReader);
var expressionLexer = new ExpressionLexer(input);
var tokens = new CommonTokenStream(expressionLexer);
tokens._tokens collection contains one value
[0] = {[@0,1:1='<EOF>',<-1>,1:1]}
It’s incorrect.
If I change expression to ‘p’ (lowercase letter)
tokens._tokens collection contains two values
[0] = {[@0,0:0='p',<0>,1:0]}
[1] = {[@1,1:1='<EOF>',<-1>,1:1]}
It’s correct.
When string PREV_VALUE : 'PreviousValue'; is removed from grammar, both expressions are parsed correctly.
Is it possible to use different case in keywords?
Is there any example of using such keywords in ANTLR grammar?
I find it hard to believe a
ptoken is created based on the grammar you posted. Lexer rules that havefragmentin front of them will not produce tokens: these rules are only used by other lexer rules.A simple demo shows this:
Now generate the lexer and compile the
.javasource file:and run a few tests:
which is correct since there is no (non-fragment) rule that starts with, or matches, a
"p".which is correct since the only (non-fragment) rule that starts with a
"P"expects an"r"to be the next character (which isn’t there).