I am getting information from a large xml file and I am using python with lxml target parsing interface to do it. I would like the possibility to set a limit after which parsing stops. Here is some code:
Parser target code:
class TitleTarget(object):
def __init__(self, limit=None):
self.limit = limit
self.counter = 0
def start(self, tag, attrib):
if self.limit and self.counter > self.limit:
#### BREAK HERE ####
return False
#doProcessing(attrib)
self.counter = self.counter + 1
def end(self, tag):
pass
def data(self, data):
pass
def close(self):
pass
Code initiating the parsing:
parser = etree.XMLParser(target = TitleTarget(limit))
etree.parse(file, parser)
I know that the processing goes to the “BREAK HERE” -line, but I haven’t found any method to stop the parsing. I have tried returning True, False, [], and raising Error, none seem to work. It always processes until the file ends.
Is there a way to stop processing bu using this method.
Instead of using
etree.parse(file, parser), you can loop over the lines infileand callparser.feedon each line. This gives you control over when to break out of the loop.Now you can just set a
self.done=Truein the target, and test fortarget.donein the feed loop: