I am trying to use pyparsing for the first time.
My parser is not doing what I hope it would do, could someone please check and see what is wrong. I am trying to embedd OneOrMore within OneOrMore, which I think should work fine but it is not.
below is the whole code:
import pyparsing
status = """
sale number : 11/7
NAME ID PAWN PRICE TIME %C STATE START/STOP
cross-cu-1 1055 1 106284K 07:49:36.19 25.05% run 1d01h
cross-cu-2 918 1 104708K 07:38:19.08 24.02% run 1d01h
sale number : 11/8
NAME ID PAWN PRICE TIME %C STATE START/STOP
cross-cu-3 1055 1 106284K 07:49:36.19 25.05% run 1d01h
cross-cu-4 918 1 104708K 07:38:19.08 24.02% run 1d01h
"""
integer = pyparsing.Word(pyparsing.nums).setParseAction(lambda toks: int(toks[0]))
decimal = pyparsing.Word(pyparsing.nums + ".").setParseAction(lambda toks: float(toks[0]))
wordSuppress = pyparsing.Suppress(pyparsing.Word(pyparsing.alphas))
endOfLine = pyparsing.LineEnd().suppress()
colon = pyparsing.Suppress(":")
saleNumber = pyparsing.Regex("\d{2}\/\d{1}").setResultsName("saleNumber")
lineSuppress = pyparsing.Regex("NAME.*STOP") + endOfLine
saleRow = wordSuppress + wordSuppress + colon + saleNumber + endOfLine
name = pyparsing.Regex("cross-cu-\d").setResultsName("name")
id = integer.setResultsName("id")
pawn = integer.setResultsName("pawn")
price = integer.setResultsName("price") + "K"
time = pyparsing.Regex("\d{2}:\d{2}:\d{2}.\d{2}").setResultsName("time")
c = decimal.setResultsName("c") + "%"
state = pyparsing.Word(pyparsing.alphas).setResultsName("state")
startStop = pyparsing.Word(pyparsing.alphanums).setResultsName("startStop")
row = name + id + pawn + price + time + c + state + startStop + endOfLine
table = pyparsing.OneOrMore(pyparsing.Group(saleRow + lineSuppress.suppress() + (pyparsing.OneOrMore(pyparsing.Group(row) | pyparsing.SkipTo(row).suppress())) ) | pyparsing.SkipTo(saleRow).suppress())
resultDic = [x.asDict() for x in table.parseString(status)]
print resultDic
It returns only [{'saleNumber': '11/7'}]
I was hoping to get a list of dic like this:
[{ {'saleNumber': '11/7'},{ elements in cross-cu-1 line, elements in cross-cu-2 line } },
{ {'saleNumber': '11/8'},{ elements in cross-cu-3 line, elements in cross-cu-4 line } }]
Any help is appreciated!
Please don´t suggest other ways of implementing this output! I am trying to learn pyparsing as well!
In this case pyparsing is probably overkill. Why don’t you simply read the file line by line and then parse the results?
The code would look like this:
EDIT: I have updated the code to follow your example more closely.
from collections import defaultdict