Is there anyway to improve the speed of this script by using lxml or

Question

0

Asked: June 2, 20262026-06-02T15:09:59+00:00 2026-06-02T15:09:59+00:00

Is there anyway to improve the speed of this script by using lxml or

0

Is there anyway to improve the speed of this script by using lxml or mechanizer and cutting out beautiful soup all togther?

python:

import lxml.html as html
import urllib
import urlparse
from BeautifulSoup import BeautifulSoup
import re
import os, sys
print ("downloading and parsing bibles...")
root = html.parse(open('all.html'))
for link in root.findall('//a'):
  url = link.get('href')
  name = urlparse.urlparse(url).path.split('/')[-1]
  dirname = urlparse.urlparse(url).path.split('.')[-1]
  f = urllib.urlopen(url)
  s = f.read()
  if (os.path.isdir(dirname) == 0): 
    os.mkdir(dirname)
  soup = BeautifulSoup(s)
  articleTag = soup.html.body.article
  converted = str(articleTag)
  full_path = os.path.join(dirname, name)
  open(full_path, 'w').write(converted)
  print(name)
print("downloads complete!")

all.html

<a href="http://www.youversion.com/bible/gen.1.nmv-fas">http://www.youversion.com/bible/gen.1.nmv-fas</a>

<a href="http://www.youversion.com/bible/gen.2.nmv-fas">http://www.youversion.com/bible/gen.2.nmv-fas</a>

<a href="http://www.youversion.com/bible/gen.3.nmv-fas">http://www.youversion.com/bible/gen.3.nmv-fas</a>

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-02T15:10:00+00:00

Editorial Team

2026-06-02T15:10:00+00:00Added an answer on June 2, 2026 at 3:10 pm

You should start with measuring what really takes time in your script. Optimizing something which is not slow is waste of your time.

It is probably the download, not the parsing. In that case switching the parser will not help. To speedup downloading of many files using threads (one for each download) may help as another download can start before the first is completed.

0

Reply
Share
Share

- Report

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

Is there anyway to improve the speed of this script by using lxml or

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply