I have code that uses the BeautifulSoup library for parsing, but it is very

Question

0

Asked: May 14, 20262026-05-14T14:51:17+00:00 2026-05-14T14:51:17+00:00

I have code that uses the BeautifulSoup library for parsing, but it is very

0

I have code that uses the BeautifulSoup library for parsing, but it is very slow. The code is written in such a way that threads cannot be used.
Can anyone help me with this?

I am using BeautifulSoup for parsing and than save into a DB. If I comment out the save statement, it still takes a long time, so there is no problem with the database.

def parse(self,text):                
    soup = BeautifulSoup(text)
    arr = soup.findAll('tbody')                

    for i in range(0,len(arr)-1):
        data=Data()
        soup2 = BeautifulSoup(str(arr[i]))
        arr2 = soup2.findAll('td')

        c=0
        for j in arr2:                                       
            if str(j).find("<a href=") > 0:
                data.sourceURL = self.getAttributeValue(str(j),'<a href="')
            else:  
                if c == 2:
                    data.Hits=j.renderContents()

            #and few others...

            c = c+1

            data.save()

Any suggestions?

Note: I already ask this question here but that was closed due to incomplete information.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-14T14:51:18+00:00

soup2 = BeautifulSoup(str(arr[i]))
arr2 = soup2.findAll('td')

Don’t do this: Just call arr2 = arr[i].findAll('td') instead.

This will also be slow:

if str(j).find("<a href=") > 0:
    data.sourceURL = self.getAttributeValue(str(j),'<a href="')

Assuming that getAttributeValue gives you the href attribute, use this instead:

a = j.find('a', href=True)       #find first <a> with href attribute
if a:
    data.sourceURL = a['href']
else:
    #....

In general, you shouldn’t need to convert the BeautifulSoup object back into a string if all you want to do is parse it and extract values. Since the find and findAll methods give you back searchable objects, you can keep searching by invoking the find/findAll/etc. methods on the results.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I have code that uses the BeautifulSoup library for parsing, but it is very

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply