How can we distinguish a variable name, and an identifer, in an ANTLR grammar?
VAR: ('A'..'Z')+ DIGIT* ;
IDENT : ('a'..'z'|'A'..'Z'|'_') ('a'..'z'|'A'..'Z'|'0'..'9'|'_'|'-')*;
The piece of grammar (in ANTLR) does not work because the compiler will complain that IDENT may never be reached for some input. This seems to be a classic head-hack for compiler writers, The lexer hack
For the ANTLR users, Could you tell me your neat way to work around it? Thanks
No, that is not correct. The following grammar:
does not produce any error or warning. The lexer simply creates two type of tokens:
VARis created;('a'..'z'|'A'..'Z'|'0'..'9'|'_'|'-')*, aIDENTis created.Note that therefor an
IDENTcan never start with an uppercase ascii letter: that will always become aVAR.So, if you have a parser rule that looks like:
and the entire input is
"BAR", then there will be a parser error because the lexer will not produce aINDENTtoken, but aVARtoken, even though the parser “asks” for aIDENT.You must understand that no matter what the parser asks from the lexer, the lexer operates independently from the parser.