I’m using ANTLR to tokenize a simple grammar, and need to differentiate between an ID:
ID : LETTER (LETTER | DIGIT)* ;
fragment DIGIT : '0'..'9' ;
fragment LETTER : 'a'..'z' | 'A'..'Z' ;
and a RESERVED_WORD:
RESERVED_WORD : 'class' | 'public' | 'static' | 'extends' | 'void' | 'int' | 'boolean' | 'if' | 'else' | 'while' | 'return' | 'null' | 'true' | 'false' | 'this' | 'new' | 'String' ;
Say I run the lexer on the input:
class abc
I receive two ID tokens for “class” and “abc”, while I want “class” to be recognized as a RESERVED_WORD. How can I accomplish this?
Whenever 2 (or more) rules match the same amount of characters, the one defined first will “win”. So, if you define
RESERVED_WORDbeforeID, like this:The input
"class"will be tokenized as aRESERVED_WORD.Note that it doesn’t make a lot of sense to create a single token that matches any reserved word: usually it is done like this:
Now
"false"will become aFALSEtoken, and"falser"anID.