I have this small class: class HTMLTagStripper(HTMLParser): def init(self): self.reset() self.fed = [] def

Question

0

Asked: June 7, 20262026-06-07T20:31:10+00:00 2026-06-07T20:31:10+00:00

I have this small class: class HTMLTagStripper(HTMLParser): def init(self): self.reset() self.fed = [] def

0

I have this small class:

class HTMLTagStripper(HTMLParser):
    def __init__(self):
       self.reset()
       self.fed = []
    def handle_data(self, data):
       self.fed.append(data)
    def handle_starttag(self, tag, attrs):
       if tag == 'a':
           return attrs[0][1]
    def get_data(self):
       return ''.join(self.fed)

parsing this HTML code:

<div id="footer">
<p>long text.</p>
<p>click <a href="somelink.com">here</a>
</div>

This is the result I get: long text click here

but I want to get: long text click somelink.com

Is there a way to do this?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-07T20:31:11+00:00

I was actually checking out this new html parser library and come up with this solution:

from htmldom import htmldom
dom = htmldom.HtmlDom().createDom( """<div id="footer">
<p>long text.</p>
<p>click <a href="somelink.com">here</a>
</div>""");
nodes = dom.find( "p" ).children( all_children = True ) # this makes all text nodes to be in the set.
for node in nodes:
    if node._is( "a" ):
        print( node.attr( "href" ).strip() )
    elif node._is( "text" ):
        print( node.getNode().text, end = '', sep = ' ' )

You can download the library from Sourceforge or from python package index: HtmlDom, works on python 3.x, documentation of the library is not that good but it is understandable. Hope you like the answer:)

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I have this small class: class HTMLTagStripper(HTMLParser): def __init__(self): self.reset() self.fed = [] def

Leave an answerCancel reply

1 Answer

I have this small class: class HTMLTagStripper(HTMLParser): def init(self): self.reset() self.fed = [] def

Leave an answer
Cancel reply