I’m trying to parse a simple key = value query language. I’ve actually accomplished it with a huge monstrosity parser that I then make a second pass through to clean up the parse tree. What I’d like to do is make a clean parse from the bottom up, which includes things like using sets for the (key,val) pairs so redundant pairs are eliminated etc. While I got it working before, I don’t feel like I fully understood why pyparsing was acting the way it was, so I did a lot of work arounds etc, sort of fighting against the grain.
Currently, here is the beginning of my “simplified” parser:
from pyparsing import *
bool_act = lambda t: bool(t[0])
int_act = lambda t: int(t[0])
def keyval_act(instring, loc, tokens):
return set([(tokens.k, tokens.v)])
def keyin_act(instring, loc, tokens):
return set([(tokens.k, set(tokens.vs))])
string = (
Word(alphas + '_', alphanums + '_')
| quotedString.setParseAction( removeQuotes )
)
boolean = (
CaselessLiteral('true')
| CaselessLiteral('false')
)
integer = Word(nums).setParseAction( int_act )
value = (
boolean.setParseAction(bool_act)
| integer
| string
)
keyval = (string('k') + Suppress('=') + value('v')
).setParseAction(keyval_act)
keyin = (
string('k') + Suppress(CaselessLiteral('in')) +
nestedExpr('{','}', content = delimitedList(value)('vs'))
).setParseAction(keyin_act)
grammar = keyin + stringEnd | keyval + stringEnd
Currently, the “grammar” nonterminal is just a stub, I will eventually add nestable conjunctions and disjunctions to the keys so that searches like this can be parsed:
a = 1, b = 2 , c in {1,2,3} | d = 4, ( e = 5 | e = 2, (f = 3, f = 4))
For now though, I am having trouble understanding how pyparsing calls my setParseAction functions. I know there is some magic in terms of how many arguments are passed, but I am getting an error where no arguments are being passed to the function at all. So currently, if I do:
grammar.parseString('hi in {1,2,3}')
I get this error:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python2.6/site-packages/pyparsing.py", line 1021, in parseString
loc, tokens = self._parse( instring, 0 )
File "/usr/lib/python2.6/site-packages/pyparsing.py", line 894, in _parseNoCache
loc,tokens = self.parseImpl( instring, preloc, doActions )
File "/usr/lib/python2.6/site-packages/pyparsing.py", line 2478, in parseImpl
ret = e._parse( instring, loc, doActions )
File "/usr/lib/python2.6/site-packages/pyparsing.py", line 894, in _parseNoCache
loc,tokens = self.parseImpl( instring, preloc, doActions )
File "/usr/lib/python2.6/site-packages/pyparsing.py", line 2351, in parseImpl
loc, resultlist = self.exprs[0]._parse( instring, loc, doActions, callPreParse=False )
File "/usr/lib/python2.6/site-packages/pyparsing.py", line 921, in _parseNoCache
tokens = fn( instring, tokensStart, retTokens )
File "/usr/lib/python2.6/site-packages/pyparsing.py", line 675, in wrapper
return func(*args[limit[0]:])
TypeError: keyin_act() takes exactly 3 arguments (0 given)
As you can see from the traceback, I’m using python2.6, and pyparsing 1.5.6
Can anyone give me some insight into why the function isn’t getting the right number of arguments?
Well, the latest version of
setParseActiondoes do some extra magic, but unfortunately at the expense of some development simplicity. The argument detection logic in setParseAction now relies on the raising of exceptions in the parse action until it is called with the correct number of arguments, starting at 3 and working its way down to 0, after which it just gives up and raises the exception you saw.Except in this case, the exception coming from the parse action was not due to an argument list mismatch, but be a real error in your code. To get a better view at this, insert a generic try-except into your parse action:
And you get:
In fact, the second element of your list from which you are creating the return set is itself a set, a mutable container, thus not hashable for inclusion in a set. If you change this to use a frozenset instead, then you’ll get:
Why is the frozenset empty? I suggest you change the location of your results name ‘vs’ to:
And now the parsed results returned by parsing ‘hi in {1,2,3}’ are:
This is something of a mess, if we drop this line at the top of your parse action, you’ll see what the different named results actually contain:
We get:
So ‘vs’ actually points to a list containing a list. So we probably want to build our set from
tokens.vs[0], nottokens.vs. Now our parsed results look like:Some other tips on your grammar:
Instead of CaselessLiteral, try using CaselessKeyword. Keywords are better choice for grammar keywords, since they inherently avoid mistaking the leading ‘in’ of ‘inside’ as the keyword ‘in’ in your grammar.
Not sure where you are heading with returning sets from the parse actions – for key-value pairs, a tuple will probably be better, since it will preserve the order of tokens. Build up your sets of keys and values in the after-parsing phase of the program.
For other grammar debugging tools, check out
setDebugand thetraceParseActiondecorator.