How to use lexer rules having same starting?
I am trying to use two similar lexer rules (having the same starting):
TIMECONSTANT: ('0'..'9')+ ':' ('0'..'9')+;
INTEGER : ('0'..'9')+;
COLON : ':';
Here is my sample grammar:
grammar TestTime;
text : (timeexpr | caseblock)*;
timeexpr : TIME;
caseblock : INT COLON ID;
TIME : ('0'..'9')+ ':' ('0'..'9')+;
INT : ('0'..'9')+;
COLON : ':';
ID : ('a'..'z')+;
WS : (' ' | '\t' | '\r' | '\n') {$channel=HIDDEN;};
When i try to parse text:
12:44
123 : abc
123: abc
First two lines are parsed correctly, 3rd – generates error.
For some reason, ‘123:’ ANTLR parses as TIME (while it is not)…
So, is it possible to make grammar with such lexems?
Having such rules is necessary in my language for using both case-blocks and datetime constants. For example in my language it is possible to write:
case MyInt of
1: a := 01.01.2012;
2: b := 12:44;
3: ....
end;
As soon
DIGIT+ ':'is matched, the lexer expects this to be followed by anotherDIGITto match aTIMECONSTANT. If this does not happen, it cannot fall back on another lexer rule that matchesDIGIT+ ':'and the lexer will not give up on the already matched':'to match anINTEGER.A possible solution would be to optionally match
':' DIGIT+at the end of theINTEGERrule and change the type of the token if this gets matched:When parsing the input:
the following will be printed:
EDIT
True. However, this is not an ANTLR short coming: most lexer generators I know will have a problem properly tokenizing such a
TIMECONSTANT(whenINTEGERandCOLONare also present). ANTLR at least facilitates a way to handle it in the lexer 🙂You could also let this be handled by the parser instead of the lexer:
However, if your language’s lexer ignores white spaces, then input like:
would also be match by the
time_construle, of course.