I am trying to parse a small expression language (I didn’t define the language, from a vendor) and everything is fine until I try to use the not operator, which is a tilde in this language.
My grammar has been heavily influenced by these two links (aka shameless cut and pasting):
http://www.codeproject.com/KB/recipes/sota_expression_evaluator.aspx http://www.alittlemadness.com/2006/06/05/antlr-by-example-part-1-the-language
The language consists of three expression types that can be used with and, or, not operators and parenthesis change precedence. Expressions are:
Skill("name") > some_number (can also be <, >=, <=, =, !=)
SkillExists("name")
LoggedIn("name") (this one can also have name@name)
This input works fine:
Skill("somename") > 1 | (LoggedIn("somename") & SkillExists("othername"))
However, as soon as I try to use the not operator I get NoViableAltException. I can’t figure out why. I have compared my grammar to the ECalc.g one at the codeproject.com link and they seem to match, there must be some subtle difference I can’t see. Fails:
Skill("somename") < 10 ~ SkillExists("othername")
My Grammar:
grammar UserAttribute;
options {
output=AST;
ASTLabelType=CommonTree;
}
tokens {
SKILL = 'Skill' ;
SKILL_EXISTS = 'SkillExists' ;
LOGGED_IN = 'LoggedIn';
GT = '>';
LT = '<';
LTE = '<=';
GTE = '>=';
EQUALS = '=';
NOT_EQUALS = '!=';
AND = '&';
OR = '|' ;
NOT = '~';
LPAREN = '(';
RPAREN = ')';
QUOTE = '"';
AT = '@';
}
/*------------------------------------------------------------------
* PARSER RULES
*------------------------------------------------------------------*/
expression : orexpression EOF!;
orexpression : andexpression (OR^ andexpression)*;
andexpression : notexpression (AND^ notexpression)*;
notexpression : primaryexpression | NOT^ primaryexpression;
primaryexpression : term | LPAREN! orexpression RPAREN!;
term : skill_exists | skill | logged_in;
skill_exists : SKILL_EXISTS LPAREN QUOTE NAME QUOTE RPAREN;
logged_in : LOGGED_IN LPAREN QUOTE NAME (AT NAME)? QUOTE RPAREN;
skill: SKILL LPAREN QUOTE NAME QUOTE RPAREN ((GT | LT| LTE | GTE | EQUALS | NOT_EQUALS)? NUMBER*)?;
/*------------------------------------------------------------------
* LEXER RULES
*------------------------------------------------------------------*/
NAME : ('a'..'z' | 'A'..'Z' | '_')+;
NUMBER : ('0'..'9')+ ;
WHITESPACE : ( '\t' | ' ' | '\r' | '\n'| '\u000C' )+ { $channel = HIDDEN; } ;
I have 2 remarks:
1
Since you’re parsing single expressions (
expression : orexpression EOF!;), the input"Skill("somename") < 10 ~ SkillExists("othername")"is not only invalid in your grammar, but it’s invalid in terms of any expression parser (I know of). Anotexpressiononly takes a “right-hand-side” expression, so~ SkillExists("othername")is a single expression andSkill("somename") < 10is also a single expression. But in between those two single expression, there’s noORorANDoperator. It would be the same as evaluating the expressiontrue falseinstead oftrue | falseortrue and false.In short, your grammar disallows:
but allows for:
which seems logical to me.
2
I don’t quite understand your
skillrule (which is ambiguous, btw):This means that the operator is optional and there can be zero or more numbers at the end. This means that the following input are all valid:
Skill("foo") = 10 20Skill("foo") 10 20 30Skill("foo") <Perhaps you meant:
instead? (the
?becomes a^and the*is removed)If I only change that rule and parse the input:
the following AST is created:
(as you can see, the AST needs to be better formed: i.e. you need some rewrite rules in your
skill_exists,logged_inandskillrules)EDIT
and if you want successive expressions to have implied
ANDtokens in between, do something like this:As you can see, since the
ANDis now optional, it cannot be used inside a rewrite rule, but you’ll have to use the imaginary tokenI_AND.If you now parse the input:
you will get the following AST: