I have the following html pattern I want to scrap using BeautifulSoup. The html pattern is:
<a href="link" target="_blank" onclick="blah blah blah">TITLE</a>
I want to grab TITLE and the information that is displayed in the link. That is, if you clicked the link there is a a description of the TITLE. I want that description.
I started with just trying to grab the title with the following code:
import urllib
from bs4 import BeautifulSoup
import re
webpage = urrlib.urlopen("http://urlofinterest")
title = re.compile('<a>(.*)</a>')
findTitle = re.findall(title,webpage)
print findTile
My output is:
% python beta2.py
[]
So this is obviously not even finding the title. I even tried <a href>(.*)</a> and that didn’t work. Based on my reading of the documentation and I thought BeautifulSoup will grab whatever text is between the symbols I give it. In this case, , so what am I doing wrong?
How come you’re importing beautifulsoup and then not using it at all?
You’ll want to read the data from this, so that:
Something like (should get you to a point to go further):