I am trying to create a crawler that crawl first 100 pages on a

Question

0

Editorial Team

Asked: June 5, 20262026-06-05T18:25:04+00:00 2026-06-05T18:25:04+00:00

I am trying to create a crawler that crawl first 100 pages on a

0

I am trying to create a crawler that crawl first 100 pages on a website:

My code is something like this:

def extractproducts(pagenumber):
    contenturl = "http://websiteurl/page/" + str(pagenumber)

    content = BeautifulSoup(urllib2.urlopen(contenturl).read())
    print pagehtml



pagenumberlist = range(1, 101)

for pagenumber in pagenumberlist:
    extractproducts(pagenumber)

How do i go about using threading module in this situation so that urllib will crawl X number of URLs at a time using mutli threads?

/newb out

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-05T18:25:05+00:00

Most likely, you want to use multiprocessing. There’s a Pool you can use to execute multiple things in parallel:

from multiprocessing import Pool

# Note: This many threads may make your system unresponsive for a while
p = Pool(100)

# First argument is the function to call,
# second argument is a list of arguments
# (the function is called on each item in the list)
p.map(extractproducts, pagenumberlist)

If your function returns anything, Pool.map will return a list of return values:

def f(x):
    return x + 1

results = Pool().map(f, [1, 4, 5])
print(results) # [2, 5, 6]

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I am trying to create a crawler that crawl first 100 pages on a

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply