I am working on my Antlr grammar to parse polynomial functions in multiple variables using Java. Examples for legal input are
42; X; +42X; Y^42; 1337HelloWorld; 13,37X^42;
The following grammar does compile without warnings or errors:
grammar Function;
parseFunction returns [java.util.List<java.util.List<Object>> list] :
{ list = new java.util.ArrayList(); } ( f=functionPart { list.add($f.list); } )+
| { list = new java.util.ArrayList(); } ( fb=functionBegin ) { list.add($fb.list); } ( f=functionPart { list.add($f.list); } )*
;
functionBegin returns [java.util.List<Object> list]:
m=NUMBER v=VARIABLE e=exponent { list = new java.util.ArrayList(); list.add("+"); list.add($m.text); list.add($v.text); list.add($e.value); }
| m=NUMBER v=VARIABLE { list = new java.util.ArrayList(); list.add("+"); list.add($m.text); list.add($v.text); }
| v=VARIABLE e=exponent { list = new java.util.ArrayList(); list.add("+"); list.add("1"); list.add($v.text); list.add($e.value); }
| v=VARIABLE { list = new java.util.ArrayList(); list.add("+"); list.add("1"); list.add($v.text); }
| m=NUMBER { list = new java.util.ArrayList(); list.add("+"); list.add($m.text); }
;
functionPart returns [java.util.List<Object> list] :
s=SIGN m=NUMBER v=VARIABLE e=exponent { list = new java.util.ArrayList(); list.add($s.text); list.add($m.text); list.add($v.text); list.add($e.value); }
| s=SIGN m=NUMBER v=VARIABLE { list = new java.util.ArrayList(); list.add($s.text); list.add($m.text); list.add($v.text); }
| s=SIGN v=VARIABLE e=exponent { list = new java.util.ArrayList(); list.add($s.text); list.add("1"); list.add($v.text); list.add($e.value); }
| s=SIGN v=VARIABLE { list = new java.util.ArrayList(); list.add($s.text); list.add("1"); list.add($v.text); }
| s=SIGN m=NUMBER { list = new java.util.ArrayList(); list.add($s.text); list.add($m.text); }
;
exponent returns [int value]: ('^' n=INTEGER) { $value = 1; if ( $n != null && $n.text.length() > 0) $value = Integer.parseInt($n.text); }
;
VARIABLE : ('a'..'z'|'A'..'Z')+
;
INTEGER : ('0'..'9')+
;
NUMBER : ('0'..'9')+ (','('0'..'9')+)?
;
SIGN : ('+'|'-')
;
WS : (' ' | '\t' | '\r'| '\n')+ {skip();}
;
This grammar, if compiled and used in Java does accept most input values. Apparently, not all valid input values are accepted. As soon as a number not using a comma pops up, like the inputs
+42; 42; 42X^1337;
an error is thrown (error from input “+42”):
line 1:1 no viable alternative at input '+'
The error is not thrown if I modify the inputs to
+42,0; 42,0; 42,0X^1337
Can anyone say, why and how to fix it?
The first lexer rule with the longest match wins, thus
42is anINTEGER, andNUMBERin fact only matches when the comma part is present, i.e. whenNUMBERhas a longer match thanINTEGER.This can be fixed by adding a parser rule
and using that instead of
NUMBERfrom other parser rules.