I am using python urllib2 to download pages from the web. I am not using any kind of user_agent etc. I am getting below sample errors. Can someone tell me a easy way to avoid them.
http://www.rottentomatoes.com/m/foxy_brown/
The server couldn't fulfill the request.
Error code: 403
http://www.spiritus-temporis.com/marc-platt-dancer-/
The server couldn't fulfill the request.
Error code: 503
http://www.golf-equipment-guide.com/news/Mark-Nichols-(golfer).html!!
The server couldn't fulfill the request.
Error code: 500
http://www.ehx.com/blog/mike-matthews-in-fuzz-documentary!!
We failed to reach a server.
Reason: timed out
IncompleteRead(5621 bytes read)
Traceback (most recent call last):
File "download.py", line 43, in <module>
localFile.write(response.read())
File "/usr/lib/python2.6/socket.py", line 327, in read
data = self._sock.recv(rbufsize)
File "/usr/lib/python2.6/httplib.py", line 517, in read
return self._read_chunked(amt)
File "/usr/lib/python2.6/httplib.py", line 563, in _read_chunked
raise IncompleteRead(value)
IncompleteRead: IncompleteRead(5621 bytes read)
Thank you
Bala
Many web resources require some kind of cookie or other authentication to access, your 403 status codes are most likely the result of this.
503 errors tend to mean you’re rapidly accessing resources from a server in a loop and you need to wait briefly before attempting another access.
The 500 example doesn’t even appear to exist…
The timeout error may not need the “!!”, I can only load the resource without it.
I recommend you read up on http status codes.