I’m writing a small text scraping script with Python. It’s my first bigger project so I have some problems. I’m using urllib2 and BeautifulSoup. I want to scrape song names from one playlist. I can get one song name or all song names + other strings that I don’t need. I can’t manage to get only all song names. My code that gets all song names + other strings that I don’t need:
import urllib2
from bs4 import BeautifulSoup
import re
response = urllib2.urlopen('http://guardsmanbob.com/media/playlist.php?char=a').read()
soup = BeautifulSoup(response)
for tr in soup.findAll('tr')[0]:
for td in soup.findAll('a'):
print td.contents[0]
And code which gives me one song:
print soup.findAll('tr')[1].findAll('a')[0].contents[0]
It’s actually not a loop so I can’t get no more than one, but if I try to make it loop, I got like 10 same song names. That code:
for tr in soup.findAll('tr')[1]:
for td in soup.findAll('td')[0]:
print td.contents[0]
I’m stuck for a day now and I can’t get it working. I don’t understand how does these things work.
findAll('tr')instead offindAll('tr')[0].“
for td in tr.find“, not “for td in soup.find“, because you want to look intr‘s not in the whole document (soup).