I am trying to parse a website and retrieve the texts that contain Hyper

Question

0

Editorial Team

Asked: June 16, 20262026-06-16T01:48:56+00:00 2026-06-16T01:48:56+00:00

I am trying to parse a website and retrieve the texts that contain Hyper

0

I am trying to parse a website and retrieve the texts that contain Hyper link.
For eg:

<a href="www.example.com">This is an Example</a>

I need to retrieve “This is an Example”, which I am able to do for pages that dont have broken tags. I am unable to retrieve in following case:

<html>
<body>
<a href = "http:\\www.google.com">Google<br>
<a href = "http:\\www.example.com">Example</a>
</body>
</html>

In such cases it the code is unable to retrieve Google because of the broken tag that links google and only gives me “Example”. Is there a way to also retrieve “Google”?

My code is here:

from bs4 import BeautifulSoup
from bs4 import SoupStrainer

f = open("sol.html","r")

soup = BeautifulSoup(f,parse_only=SoupStrainer('a'))
for link in soup.findAll('a',text=True):
    print link.renderContents();

Please note sol.html contains the above given html code itself.

Thanks
– AJ

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-16T01:48:57+00:00

Remove text=True from your code and it should work just fine:

>>> from bs4 import BeautifulSoup
>>> soup = BeautifulSoup('''
... <html>
... <body>
... <a href = "http:\\www.google.com">Google<br>
... <a href = "http:\\www.example.com">Example</a>
... </body>
... </html>
... ''')
>>> [a.get_text().strip() for a in soup.find_all('a')]
[u'Google', u'Example']
>>> [a.get_text().strip() for a in soup.find_all('a', text=True)]
[u'Example']

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I am trying to parse a website and retrieve the texts that contain Hyper

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply