Iam trying to follow the multithreading example given in: Python urllib2.urlopen() is slow, need

Question

0

Asked: May 28, 20262026-05-28T17:20:20+00:00 2026-05-28T17:20:20+00:00

Iam trying to follow the multithreading example given in: Python urllib2.urlopen() is slow, need

0

Iam trying to follow the multithreading example given in:
Python urllib2.urlopen() is slow, need a better way to read several urls but I seem to get a “thread error” and I am not sure what this really means.

urlList=[list of urls to be fetched]*100
def read_url(url, queue):
 my_data=[]
 try:
    data = urllib2.urlopen(url,None,15).read()
    print('Fetched %s from %s' % (len(data), url))
    my_data.append(data)
    queue.put(data)
except HTTPError, e:
    data = urllib2.urlopen(url).read()
    print('Fetched %s from %s' % (len(data), url))
    my_data.append(data)
    queue.put(data)

def fetch_parallel():
    result = Queue.Queue()
    threads = [threading.Thread(target=read_url, args = (url,result)) for url in urlList]
    for t in threads:
      t.start()
    for t in threads:
      t.join()
    return result

res=[]  
res=fetch_parallel()
reslist = []
while not res.empty: reslist.append(res.get())
print (reslist)

I get the following first error:

Exception in thread Thread-1:
Traceback (most recent call last):
File "/usr/lib/python2.7/threading.py", line 552, in __bootstrap_inner
self.run()
File "/usr/lib/python2.7/threading.py", line 505, in run
self.__target(*self.__args, **self.__kwargs)
File "demo.py", line 76, in read_url
print('Fetched %s from %s' % (len(data), url))
TypeError: object of type 'instancemethod' has no len()

On the other hand, I see that sometimes, it does seem to fetch data, but then I get the following second error:

Traceback (most recent call last):
File "demo.py", line 89, in <module>
print str(res[0])
AttributeError: Queue instance has no attribute '__getitem__'

When it fetches data, why is the result not showing up in res[]? Thanks for your time.

Update After changing read to read() in the read_url() function, although the situation has improved (I now get many page fetches), but still got the error:

Exception in thread Thread-86:
Traceback (most recent call last):
File "/usr/lib/python2.7/threading.py", line 552, in __bootstrap_inner
self.run()
File "/usr/lib/python2.7/threading.py", line 505, in run
self.__target(*self.__args, **self.__kwargs)
File "demo.py", line 75, in read_url
data = urllib2.urlopen(url).read()
File "/usr/lib/python2.7/urllib2.py", line 126, in urlopen
return _opener.open(url, data, timeout)
File "/usr/lib/python2.7/urllib2.py", line 397, in open
response = meth(req, response)
File "/usr/lib/python2.7/urllib2.py", line 510, in http_response
'http', request, response, code, msg, hdrs)
File "/usr/lib/python2.7/urllib2.py", line 429, in error
result = self._call_chain(*args)
File "/usr/lib/python2.7/urllib2.py", line 369, in _call_chain
result = func(*args)
File "/usr/lib/python2.7/urllib2.py", line 605, in http_error_302
return self.parent.open(new, timeout=req.timeout)
File "/usr/lib/python2.7/urllib2.py", line 397, in open
response = meth(req, response)
File "/usr/lib/python2.7/urllib2.py", line 510, in http_response
'http', request, response, code, msg, hdrs)
File "/usr/lib/python2.7/urllib2.py", line 435, in error
return self._call_chain(*args)
File "/usr/lib/python2.7/urllib2.py", line 369, in _call_chain
result = func(*args)
File "/usr/lib/python2.7/urllib2.py", line 518, in http_error_default
raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)
HTTPError: HTTP Error 502: Bad Gateway

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-28T17:20:21+00:00

Note that urllib2 is not thread-safe. Therefore, you should really use urllib3.

Some of your problems are entirely unrelated to threading. Threads just make the error reporting more complex. Instead of

data = urllib2.urlopen(url).read

you want

data = urllib2.urlopen(url).read()
#                               ^^

A 502 Bad gateway error indicates a server misconfiguration (most likely, an internal server of the web service you’re connecting to is rebooting / not available). There’s nothing you can do about it – the URL is just not reachable right now. Use try..except to handle these errors, for example by printing a diagnostic message, or scheduling the URL to be retrieved after an appropriate waiting period, or by leaving out the failed data set.

To get the values from the queue, you can do the following:

res = fetch_parallel()
reslist = []
while not res.empty():
  reslist.append(res.get_nowait()) # or get, doesn't matter here
print (reslist)

There is also no way around real error handling in case a URL is really unreachable. Simply re-requesting it might work in some cases, but you must be able to handle the case that the remote host is truly unreachable at this time. How you do that depends on your application’s logic.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

Iam trying to follow the multithreading example given in: Python urllib2.urlopen() is slow, need

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply