For a markup language I’m trying to parse, I decided to give parser generation a try with ANTLR. I’m new to the field, and I’m messing something up.
My grammar is
grammar Test;
DIGIT : ('0'..'9');
LETTER : ('A'..'Z');
SLASH : '/';
restriction
: ('E' ap)
| ('L' ap)
| 'N';
ap : LETTER LETTER LETTER;
car : LETTER LETTER;
fnum : DIGIT DIGIT DIGIT DIGIT? LETTER?;
flt : car fnum?;
message : 'A' (SLASH flt)? (SLASH restriction)?;
which does exactly what I want, when I give it an input string A/KK543/EPOS. When I give it A/KL543/EPOS however, it fails (MismatchedTokenException(9!=5)). It seems like some sort of conflict; it wants to generate restriction on the first L, so it seems I’m doing something wrong in the language definition, but I can’t properly find out what.
For the input
"A/KK543/EPOS", the following tokens are created:But for the input
"A/KL543/EPOS", these are created:As you can see, the char
'L'does not get tokenized as aLETTER. For the literal tokens'A','E','L'and'N'inside your parser rules, ANTLR (automatically) creates separate lexer rules that are place before all other lexer rules. This causes your lexer to look like this behind the scenes:Therefor, any single
'A','E','L'and'N'will never become aLETTERtoken. This is simply how ANTLR works. If you want to match them as letters, you’ll need to create a parser ruleletterand let it match these tokens too. Something like this:which will parse the input
"A/KL543/EPOS"like this: