I’m using ANTLR to parse strings of mathematical expressions and tag them using MathML.
Right now I have the grammar below. Now I have three questions:
- The grammar allows for complete expressions like
2*(3+4). I want
it to also allow incomplete expressions, e.g.2*(3+. Being a
complete newbie at ANTLR I have no idea how to accomplish this.
Please point me to the right document or give an example. - The location of the square root rule
sqrtamong the atomics seems
to work but I’m pretty sure it should be somewhere in theexponent
rule? Or should it? - If I want to extend this grammar to also actually perform the
calculation, can I somehow reuse it or do I have to copy and paste?
Any other comments or suggestions on my grammar is also appreciated, as my total experience with ANTLR is now about four hours.
grammar Expr;
parse returns [String value]
: stat+ {$value = $stat.value;}
;
stat returns [String value]
: exponent NEWLINE {$value = "<math>" + $exponent.value + "</math>";}
| NEWLINE
;
exponent returns [String value]
: e=expr {$value = $e.value;}
( '^' e=expr {$value = "<msup><mrow>" + $value + "</mrow><mrow>" + $e.value + "</mrow></msup>";}
)*
;
expr returns [String value]
: e=multExpr {$value = $e.value;}
( '+' e=multExpr {$value += "<mo>+</mo>" + $e.value;}
| '-' e=multExpr {$value += "<mo>-</mo>" + $e.value;}
)*
;
multExpr returns [String value]
: e=atom {$value = $e.value;}
( '*' e=atom {$value += "<mo>*</mo>" + $e.value;}
| '/' e=atom {$value += "<mo>/</mo>" + $e.value;}
)*
;
atom returns [String value]
: INT {$value = "<mn>" + $INT.text + "</mn>";}
| '-' e=atom {$value = "<mo>-</mo>" + $e.value;}
| 'sqrt[' exponent ']' {$value = "<msqrt><mrow>" + $exponent.value + "</mrow></msqrt>";}
| '(' exponent ')' {$value = "<mo>(</mo>" + $exponent.value + "<mo>)</mo>";}
;
INT : '0'..'9'+ ;
NEWLINE:'\r'? '\n' ;
WS : (' '|'\t')+ {skip();} ;
First a few remarks about your grammar:
e1=atom ('*' e2=atom ...);sqrtand[tokens instead of 1 singlesqrt[, otherwise input like"sqrt [ 9 ]"(a space betweensqrtand[) would not be handles properly;No, it’s fine there: it should have the highest precedence. Talking of precedence, the usual precedence table (from lowest to highest) in your case would be:
sqrt[...]).That’s tricky.
I really only see one way: inside your stat rule, you first force the parser to look ahead in the token stream to check if there really is an
exprahead. This can be done using a syntactic predicate. Once the parser is sure there is anexpr, only then parse said expression. If there isn’t anexpr, try to match aNEWLINE, and if there’s also noNEWLINE, simply consume a single token other thanNEWLINE(which must be a part of an incomplete expression!). (I will post a small demo below)ANTLR parser rules can return more than one object. That’s not really true of course since Java methods (which parser rule essentially are) can only return a single object. Parser rule return an object that holds references to more than one object. So you could do:
A demo
Taking all my hints into account, a small working demo could look like this:
(note that the
(...)=>is this so-called syntactic predicate)You can test the parser generated from the grammar above with the following class:
And if you now run the class above, you will see that the input
will produce the following output: