I need to categorize this html page http://gnats.netbsd.org/summary/year/2012-perf.html , I need to make a

Question

0

Asked: June 14, 20262026-06-14T22:27:33+00:00 2026-06-14T22:27:33+00:00

I need to categorize this html page http://gnats.netbsd.org/summary/year/2012-perf.html , I need to make a

0

I need to categorize this html page http://gnats.netbsd.org/summary/year/2012-perf.html , I need to make a list of top issues just from the big table.This is my code in Python.I would be really gratefull if you could give me some advice.

    import urllib.request
from bs4 import BeautifulSoup

# overall input
inputpage = urllib.request.urlopen("http://gnats.netbsd.org/summary/year/2012-perf.html")
page = inputpage.read()
soup = BeautifulSoup(page)

# checking tables
table = soup.findAll('table')
rows = soup.findAll('tr')
colomns = soup.findAll('td')

# inputing the lists
name = []
first = []
second = []
sum = []

# the main part
for tr in rows:
    if (tr==1):
        element = tr.split("<td>")
        name.append(element)
    elif (tr==2):
        element = tr.split("<td>")
        first.append(element)
    elif (tr==3):
        element = tr.split("<td>")
        second.append(element)


# combining the open and closed issue lists
length = len(first)
for i in range(length):
    sum = first[i] + second [i]

# printing the lists
length = len(sum)
for i in range(length):
    print (name[i] + '|' + sum[i])

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-14T22:27:34+00:00

BeautifulSoup has some nice methods for accessing child nodes and so on. You could for example use tables = soup.findAll('table'). Assuming you want to combine the data of the second table in the link you posted (tables[1]), you could do something like the following

names = []
cdict = {0:[], 1:[]} # dictionary of "td positions to contents"

tables = soup.findAll('table')
for tt in tables[1].find_all('tr')[1:]: # skip first <tr> since it is the header
    names.append(tt.find_all('th')[0]) # 1st column is a th with the name
    for k, v in cdict.items():
        # append the <td>text</td> of column k to the corresponding list
        v.append(tt.find_all('td')[k].text)

So, what you’ll end up with is a dictionary of columns -> lists, so that
each list contains the td text elements (the main reason for using a dictionary
is because you may want to grab the elements from columns 1,2 and 4, in which case
you’ll only need to change what is in the cdict).

To make the sums you can do something like:

for i in xrange(len(names)):
    print names[i], int(cdict[0][i]) + int(cdict[1][i])

If you have a look at each element’s methods you’ll see some really nice functionality you can use to make your task easier.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I need to categorize this html page http://gnats.netbsd.org/summary/year/2012-perf.html , I need to make a

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply