I have the following html pattern I want to scrap using BeautifulSoup. The html

Question

0

Asked: June 18, 20262026-06-18T05:23:37+00:00 2026-06-18T05:23:37+00:00

I have the following html pattern I want to scrap using BeautifulSoup. The html

0

I have the following html pattern I want to scrap using BeautifulSoup. The html pattern is:

<a href="link" target="_blank" onclick="blah blah blah">TITLE</a>

I want to grab TITLE and the information that is displayed in the link. That is, if you clicked the link there is a a description of the TITLE. I want that description.

I started with just trying to grab the title with the following code:

import urllib
from bs4 import BeautifulSoup
import re

webpage = urrlib.urlopen("http://urlofinterest")

title = re.compile('<a>(.*)</a>')
findTitle = re.findall(title,webpage)
print findTile

My output is:

% python beta2.py
[]

So this is obviously not even finding the title. I even tried <a href>(.*)</a> and that didn’t work. Based on my reading of the documentation and I thought BeautifulSoup will grab whatever text is between the symbols I give it. In this case, , so what am I doing wrong?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-18T05:23:38+00:00

How come you’re importing beautifulsoup and then not using it at all?

webpage = urrlib.urlopen("http://urlofinterest")

You’ll want to read the data from this, so that:

webpage = urrlib.urlopen("http://urlofinterest").read()

Something like (should get you to a point to go further):

>>> blah = '<a href="link" target="_blank" onclick="blah blah blah">TITLE</a>'
>>> from bs4 import BeautifulSoup
>>> soup = BeautifulSoup(blah) # change to webpage later
>>> for tag in soup('a', href=True):
    print tag['href'], tag.string

link TITLE

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I have the following html pattern I want to scrap using BeautifulSoup. The html

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply