When using the HTMLParser class in Python, is it possible to abort processing within a handle_* function? Early in the processing, I get all the data I need, so it seems like a waste to continue processing. There’s an example below of extracting the meta description for a document.
from HTMLParser import HTMLParser
class MyParser(HTMLParser):
def handle_start(self, tag, attrs):
in_meta = False
if tag == 'meta':
for attr in attrs:
if attr[0].lower() == 'name' and attr[1].lower() == 'description':
in_meta = True
if attr[0].lower() == 'content':
print(attr[1])
# Would like to tell the parser to stop now,
# since I have all the data that I need
You can raise an exception and wrap your
.feed()call in a try block.You can also call
self.reset()when you decide, that you are done (I have not actually tried it, but according to documentation “Reset the instance. Loses all unprocessed data.”, – this is precisely what you need).