I am working with pyparsing and found it to be excellent for developing a simple DSL that allows me to extract data fields out of MongoDB and do simple arithmetic operations on them. I am now trying to extend my tools such that I can apply functions of the form Rank[Person:Height] to the fields and potentially include simple expressions as arguments to the function calls. I am struggling hard with getting the parsing syntax to work. Here is what I have so far:
# Define parser
expr = Forward()
integer = Word(nums).setParseAction(EvalConstant)
real = Combine(Word(nums) + "." + Word(nums)).setParseAction(EvalConstant)
# Handle database field references that are coming out of Mongo,
# accounting for the fact that some fields contain whitespace
dbRef = Combine(Word(alphas) + ":" + Word(printables) + \
Optional(" " + Word(alphas) + " " + Word(alphas)))
dbRef.setParseAction(EvalDBref)
# Handle function calls
functionCall = (Keyword("Rank") | Keyword("ZS") | Keyword("Ntile")) + "[" + expr + "]"
functionCall.setParseAction(EvalFunction)
operand = functionCall | dbRef | (real | integer)
signop = oneOf('+ -')
multop = oneOf('* /')
plusop = oneOf('+ -')
# Use parse actions to attach Eval constructors to sub-expressions
expr << operatorPrecedence(operand,
[
(signop, 1, opAssoc.RIGHT, EvalSignOp),
(multop, 2, opAssoc.LEFT, EvalMultOp),
(plusop, 2, opAssoc.LEFT, EvalAddOp),
])
My issue is that when I test a simple expression like Rank[Person:Height] I am getting a parse exception:
ParseException: Expected "]" (at char 19), (line:1, col:20)
If I use a float or arithmetic expression as the argument like Rank[3 + 1.1] the parsing works ok, and if I simplify the dbRef grammar so its just Word(alphas) it also works. Cannot for the life of me figure out whats wrong with my full grammar. I have tried rearranging the order of operands as well as simplifying the functionCall grammar to no avail. Can anyone see what I am doing wrong?
Once I get this working I would want to take a last step and introduce support for variable assignment in expressions ..
EDIT: Upon further testing, if I remove the printables from dbRef grammar, things work ok:
dbRef = Combine(Word(alphas) + OneOrMore(":") + Word(alphanums) + \
Optional("_" + Word(alphas)))
HOWEVER, if I add the character “-” to dbRef (which I need for DB fields like “Class:S-N”), the parser fails again. I think the “-” is being consumed by the signop in my operatorPrecedence?
What appears to happen is that the
]character at the end of your test string (Rank[Person:Height]) gets consumed as part of thedbReftoken, because the portion of this token past the initial:is declared as being made of Word(printables) (and this character set, unfortunately includes the square brackets characters)Then the parser tries to produce a
functionCallbut is missing the closing]hence the error message.A tentative fix is to use a character set that doesn’t include the square brackets, maybe something more explicit like:
Edit:
Upon closer look, the above is loosely correct, but the token hierarchy is wrong (e.g. the parser attempts to produce a
functionCallas one operand of an anexpretc.)Also, my suggested fix will not work because of the ambiguity with the
-sign which should be understood as a plain character when within adbRefand as a plusOp when within anexpr. This type of issue is common with parsers and there are ways to deal with this, though I’m not sure exactly how with pyparsing.