I’m building a syntax parser to perform simple actions on objects identified using dotted notation, something like this:
DISABLE ALL;
ENABLE A.1 B.1.1 C
but in DISABLE ALL the keyword ALL is instead matched as 3 Regex(r'[a-zA-Z]') => 'A', 'L', 'L' I use to match arguments.
How can I make a Word using regex? AFAIK I can’t get A.1.1 using Word
please see example below
import pyparsing as pp
def toggle_item_action(s, loc, tokens):
'enable / disable a sequence of items'
action = True if tokens[0].lower() == "enable" else False
for token in tokens[1:]:
print "it[%s].active = %s" % (token, action)
def toggle_all_items_action(s, loc, tokens):
'enable / disable ALL items'
action = True if tokens[0].lower() == "enable" else False
print "it.enable_all(%s)" % action
expr_separator = pp.Suppress(';')
#match A
area = pp.Regex(r'[a-zA-Z]')
#match A.1
category = pp.Regex(r'[a-zA-Z]\.\d{1,2}')
#match A.1.1
criteria = pp.Regex(r'[a-zA-Z]\.\d{1,2}\.\d{1,2}')
#match any of the above
item = area ^ category ^ criteria
#keyword to perform action on ALL items
all_ = pp.CaselessLiteral("all")
#actions
enable = pp.CaselessKeyword('enable')
disable = pp.CaselessKeyword('disable')
toggle = enable | disable
#toggle item expression
toggle_item = (toggle + item + pp.ZeroOrMore(item)
).setParseAction(toggle_item_action)
#toggle ALL items expression
toggle_all_items = (toggle + all_).setParseAction(toggle_all_items_action)
#swapping order to `toggle_all_items ^ toggle_item` works
#but seems to weak to me and error prone for future maintenance
expr = toggle_item ^ toggle_all_items
#expr = toggle_all_items ^ toggle_item
more = expr + pp.ZeroOrMore(expr_separator + expr)
more.parseString("""
ENABLE A.1 B.1.1;
DISABLE ALL
""", parseAll=True)
Is this the problem?
Should be:
EDIT – if you’re interested…
Your regexes are so similar, I thought I’d see what it would look like to combine them into one. Here is a snippet to parse out your three dotted notations using a single Regex, and then using a parse action to figure out which type you got:
Prints:
EDIT2 – I see the original problem…
What is messing you up is pyparsing’s implicit whitespace skipping. Pyparsing will skip over whitespace between defined tokens, but the converse is not true – pyparsing does not require whitespace between separate parser expressions. So in your all_-less version, “ALL” looks like 3 areas, “A”, “L”, and “L”. This is true not just of Regex, but just about any pyparsing class. See if the pyparsing WordEnd class might be useful in enforcing this.
EDIT3 – Then maybe something like this…
The way your commands are formatted, you have to make the parser first see if ALL is being toggled before looking for individual areas, etc. If you need to support something that might read “ENABLE A.1 ALL”, then use a negative lookahead for
item:item = ~all_ + (area ^ etc...).(Note also that I replaced
item + pp.ZeroOrMore(item)with justpp.OneOrMore(item).)