I have problems with my code.
#!/usr/bin/env python3.1
import urllib.request;
# Disguise as a Mozila browser on a Windows OS
userAgent = 'Mozilla/4.0 (compatible; MSIE 5.5; Windows NT)';
URL = "www.example.com/img";
req = urllib.request.Request(URL, headers={'User-Agent' : userAgent});
# Counter for the filename.
i = 0;
while True:
fname = str(i).zfill(3) + '.png';
req.full_url = URL + fname;
f = open(fname, 'wb');
try:
response = urllib.request.urlopen(req);
except:
break;
else:
f.write(response.read());
i+=1;
response.close();
finally:
f.close();
The problem seems to come when I create the urllib.request.Request object (called req). I create it with a non-existing url but later I change the url to what it should be. I’m doing this so that I can use the same urllib.request.Request object and not have to create new ones on each iteration. There is probably a mechanism for doing exactly that in python but I’m not sure what it is.
EDIT
Error message is:
>>> response = urllib.request.urlopen(req);
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python3.1/urllib/request.py", line 121, in urlopen
return _opener.open(url, data, timeout)
File "/usr/lib/python3.1/urllib/request.py", line 356, in open
response = meth(req, response)
File "/usr/lib/python3.1/urllib/request.py", line 468, in http_response
'http', request, response, code, msg, hdrs)
File "/usr/lib/python3.1/urllib/request.py", line 394, in error
return self._call_chain(*args)
File "/usr/lib/python3.1/urllib/request.py", line 328, in _call_chain
result = func(*args)
File "/usr/lib/python3.1/urllib/request.py", line 476, in http_error_default
raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 403: Forbidden
EDIT 2: My solution is the following. Probably should have done this at the start as I knew it would work:
import urllib.request;
# Disguise as a Mozila browser on a Windows OS
userAgent = 'Mozilla/4.0 (compatible; MSIE 5.5; Windows NT)';
# Counter for the filename.
i = 0;
while True:
fname = str(i).zfill(3) + '.png';
URL = "www.example.com/img" + fname;
f = open(fname, 'wb');
try:
req = urllib.request.Request(URL, headers={'User-Agent' : userAgent});
response = urllib.request.urlopen(req);
except:
break;
else:
f.write(response.read());
i+=1;
response.close();
finally:
f.close();
urllib2is fine for small scripts that only need to do one or two network interactions, but if you are doing a lot more work, you will likely find that eitherurllib3, orrequests(which not coincidentally is built on the former), may suit your needs better. Your particular example might look like:Edit: If you can use regular HTTP authentication (for which the
403 Forbiddenresponse strongly suggests), then you can add that to arequests.getwith theauthparameter, as in: