I am trying to create a very simple antlr grammar file which should parse the following file:
Report (MyReport)
Begin
End
Or without report name:
Report
Begin
End
And here is my grammar file:
grammar RL;
options {
language = Java;
}
report:
REPORT ('(' SPACE* STRING_LITERAL SPACE* ')')?
BEGIN
END
;
REPORT
: 'Report'
;
BEGIN
: 'Begin'
;
END : 'End';
NAME: LETTER (LETTER | DIGIT | '_')*;
STRING_LITERAL : NAME SPACE*;
fragment LETTER: LOWER | UPPER;
fragment LOWER: 'a'..'z';
fragment UPPER: 'A'..'Z';
fragment DIGIT: '0'..'9';
fragment SPACE: ' ' | '\t';
WHITESPACE: SPACE+ { $channel = HIDDEN; };
rule: ;
However when I debug in ANTLRWorks I always get the following error:
root -> report -> MismatchedTokenException(0!=0)
What’s wrong in my Grammar file?
thanks,
Green
A couple of remarks:
Javais the default language, so you can omitlanguage=Java;;SPACEinside a parser rule, while thisSPACEtoken is afragment: this causes the lexer never to create this token: remove it from your parser rule(s);"Report "(“Report” followed by a single white-space) is being tokenized as aSTRING_LITERAL, not as aREPORT! ANTLR’s lexer consumes characters greedily, only when two or more rules match the same amount of characters, the rule defined first will get precedence. The lexer does not produce tokens that the parser is trying to match (parsing and tokenization are being performed independently!).Try the following instead:
I would still skip spaces in the lexer. Accepting spaces between names but ignoring them in other contexts will result in some clunky rules. Instead of accounting for spaces between a report’s name, I would do something like this:
resulting in the following parse tree:
for the input:
Sure, explicitly add them in the
report_namerule: