I’m beginning to learn python. My python version is 3.1
I’ve never learnt OOP before,
so I’m confused by the HTMLParser.
from html.parser import HTMLParser
class parser(HTMLParser):
def handle_data(self, data):
print(data)
p = parser()
page = """<html><h1>title</h1><p>I'm a paragraph!</p></html>"""
p.feed(page)
I’ll get this:
title
I’m a paragraph!
I want this data passed to a function, what should I do?
Sorry for my poor English and Thank you for your help!
I did not look into the HTMLParser module itself, but I can see that feed inherently calls handle_data, which in your derived class does a print. @ron’s answer suggests passing the data directly to your function, which is totally OK. However, since you are new to OOP, maybe take a look at this code.
This is Python, 2.x, but I think the only thing that would change is the package location, html.parser instead of HTMLParser.
Here I am overriding the feed method of HTMLParser. Instead, when the call is made
p.feed(page)it will call my method, which creates / sets an instance variable called output to an empty list and then calls the feed method in the base class (HTMLParser) and proceeds with what it does normally. So, by overriding the feed method I was able to do some extra stuff (added a new output variable). The handle_data method similarly is an override method. In fact, the handle_data method of HTMLParser doesn’t even do anything… nothing at all (according to the docs.)So, just to clarify…
You call
p.feed(page)which calls the MyParser.feed methodMyParser.feed sets a variable self.output to and empty list then calls HTMLParser.feed
The handle_data method adds the line onto the end of the output list.
You now have access to the data via a call to p.output.