SO I have the following set of code parsing delicious information. It prints data

Question

0

Asked: May 18, 20262026-05-18T20:36:24+00:00 2026-05-18T20:36:24+00:00

SO I have the following set of code parsing delicious information. It prints data

0

SO I have the following set of code parsing delicious information. It prints data from a Delicious page in the following format

Bookmark | Number of People

Bookmark | Number of People
etc…

I used to use the following method to find this info.

def extract (soup):
    links = soup.findAll('a',rel='nofollow')
    for link in links:
        print >> outfile, link['href']

    hits = soup.findAll('span', attrs={'class': 'delNavCount'})
    for hit in hits:
        print >> outfile, hit.contents


#File to export data to
outfile = open("output.txt", "w")

#Browser Agent
br = Browser()    
br.set_handle_robots(False)
br.addheaders = [('User-agent', 'Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.0.1) Gecko/2008071615 Fedora/3.0.1-1.fc9 Firefox/3.0.1')]


url= "http://www.delicious.com/asd"
page = br.open(url)
html = page.read()
soup = BeautifulSoup(html)
extract(soup)

But the problem was that some bookmarks didnt have a number of people, so I decided to parse it different so that I would get the data concurrently and print out the bookmarks and number of people side by side.

EDIT: Got it from 15 – 5 seconds with this updated version, any more suggestions

def extract (soup):
    bookmarkset = soup.findAll('div', 'data')
    for bookmark in bookmarkset:
        link = bookmark.find('a',)
        vote = bookmark.find('span', 'delNavCount')
        try:
            print >> outfile, link['href'], " | " ,vote.contents
        except:
            print >> outfile, "[u'0']"
    #print bookmarkset


#File to export data to
outfile = open("output.txt", "w")

#Browser Agent
br = Browser()    
br.set_handle_robots(False)
br.addheaders = [('User-agent', 'Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.0.1) Gecko/2008071615 Fedora/3.0.1-1.fc9 Firefox/3.0.1')]


url= "http://www.delicious.com/asd"
page = br.open(url)
html = page.read()
soup = BeautifulSoup(html)
extract(soup)

The performance on this is terrible though, takes 17 secs to parse the first page, and around 15 secs thereafter on a pretty decent machine. It significantly degraded when going from the first bit of code to the second bit. Is there anything I can do to imporve perf here?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-18T20:36:25+00:00

Editorial Team

2026-05-18T20:36:25+00:00Added an answer on May 18, 2026 at 8:36 pm

I don’t understand why you are assigning to vote – twice. The first assignment is unnecessary and indeed quite heavy, since it must parse the whole document – on each iteration. Why?

   vote = BeautifulSoup(html)
   vote = bookmark.findAll('span', attrs={'class': 'delNavCount'})

0

Reply
Share
Share

- Report

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

SO I have the following set of code parsing delicious information. It prints data

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply