Im trying to write very simple HTML parser with ANTLR and Im facing problem, that ~ rule which should match all until specified character is not working.
My lexer grammar:
lexer grammar HtmlParserLexer;
HTML: OHTML PCDATA CHTML;
PCDATA :(~'<') ; //match all until <
OHTML: '<html>';
CHTML: '</html>';
Im trying to match:
<html>foo bar</html>
Error from Eclipse ANTLR plugin Interpreter:
MismatchedTokenException: line 1:7 mismatched input UNKNOW expecting '<'
Which means, that my grammar ignore PCDATA rule and I dont know why.
Thanks in advance for your help.
The rule
PCDATA :(~'<') ;matches a single character other than'<'. You’ll need to repeat it once or more:PCDATA :(~'<')+ ;(notice the+).You may also want to allow
<html></html>(nothing in between<html>and</html>). In that case, you shouldn’t changePCDATA :(~'<')+ ;intoPCDATA :(~'<')* ;, but do this instead:because you shouldn’t create lexer rules that could potentially match an empty string.