I need to categorize this html page http://gnats.netbsd.org/summary/year/2012-perf.html , I need to make a list of top issues just from the big table.This is my code in Python.I would be really gratefull if you could give me some advice.
import urllib.request
from bs4 import BeautifulSoup
# overall input
inputpage = urllib.request.urlopen("http://gnats.netbsd.org/summary/year/2012-perf.html")
page = inputpage.read()
soup = BeautifulSoup(page)
# checking tables
table = soup.findAll('table')
rows = soup.findAll('tr')
colomns = soup.findAll('td')
# inputing the lists
name = []
first = []
second = []
sum = []
# the main part
for tr in rows:
if (tr==1):
element = tr.split("<td>")
name.append(element)
elif (tr==2):
element = tr.split("<td>")
first.append(element)
elif (tr==3):
element = tr.split("<td>")
second.append(element)
# combining the open and closed issue lists
length = len(first)
for i in range(length):
sum = first[i] + second [i]
# printing the lists
length = len(sum)
for i in range(length):
print (name[i] + '|' + sum[i])
BeautifulSouphas some nice methods for accessing child nodes and so on. You could for example usetables = soup.findAll('table'). Assuming you want to combine the data of the second table in the link you posted (tables[1]), you could do something like the followingSo, what you’ll end up with is a dictionary of columns -> lists, so that
each list contains the td text elements (the main reason for using a dictionary
is because you may want to grab the elements from columns 1,2 and 4, in which case
you’ll only need to change what is in the cdict).
To make the sums you can do something like:
If you have a look at each element’s methods you’ll see some really nice functionality you can use to make your task easier.