I’m new to Python and I’m writing a webscraper that looks for <td> rows in a HTML table:
# open CSV with URLS to scrape
csv_file = csv.reader(open('urls.csv', 'rb'), delimiter=',')
names = []
for data in csv_file:
names.append(data[0])
for name in names:
html = D.get(name);
html2 = html
param = '<br />';
html2 = html2.replace("<br />", " | ")
print name
c = csv.writer(open("darkgrey.csv", "a"))
for row in xpath.search(html2, '//table/tr[@class="bgdarkgrey"]'):
cols = xpath.search(row, '/td')
c.writerow([cols[0], cols[1], cols[2], cols[3], cols[4]])
All it does is get values from 4 table '<td>'
The problem is, some of the tables don’t have cols[2], cols[3] or cols[4]
Is there a way, that I can check if these exist?
Thanks
I’m not completely familiar with
xpath, but you should be able to just check the length ofcols(as long as it isn’t a really strange object that looks like a sequence in other ways):Another common python idiom is to just try it and see.
Finally, assuming
colsis alist, you can always make sure that it is long enough:which will pad your columns with empty strings so that you have at least 5 columns (usually more).