I have a script that I’d like to continue using, but it looks like I either have to find some workaround for a bug in Python 3, or downgrade back to 2.6, and thus having to downgrade other scripts as well…
Hopefully someone here have already managed to find a workaround.
The problem is that due to the new changes in Python 3.0 regarding bytes and strings, not all the library code is apparently tested.
I have a script that downloades a page from a web server. This script passed a username and password as part of the url in python 2.6, but in Python 3.0, this doesn’t work any more.
For instance, this:
import urllib.request; url = 'http://username:password@server/file'; urllib.request.urlretrieve(url, 'temp.dat');
fails with this exception:
Traceback (most recent call last): File 'C:\Temp\test.py', line 5, in <module> urllib.request.urlretrieve(url, 'test.html'); File 'C:\Python30\lib\urllib\request.py', line 134, in urlretrieve return _urlopener.retrieve(url, filename, reporthook, data) File 'C:\Python30\lib\urllib\request.py', line 1476, in retrieve fp = self.open(url, data) File 'C:\Python30\lib\urllib\request.py', line 1444, in open return getattr(self, name)(url) File 'C:\Python30\lib\urllib\request.py', line 1618, in open_http return self._open_generic_http(http.client.HTTPConnection, url, data) File 'C:\Python30\lib\urllib\request.py', line 1576, in _open_generic_http auth = base64.b64encode(user_passwd).strip() File 'C:\Python30\lib\base64.py', line 56, in b64encode raise TypeError('expected bytes, not %s' % s.__class__.__name__) TypeError: expected bytes, not str
Apparently, base64-encoding now needs bytes in and outputs a string, and thus urlretrieve (or some code therein) which builds up a string of username:password, and tries to base64-encode this for simple authorization, fails.
If I instead try to use urlopen, like this:
import urllib.request; url = 'http://username:password@server/file'; f = urllib.request.urlopen(url); contents = f.read();
Then it fails with this exception:
Traceback (most recent call last): File 'C:\Temp\test.py', line 5, in <module> f = urllib.request.urlopen(url); File 'C:\Python30\lib\urllib\request.py', line 122, in urlopen return _opener.open(url, data, timeout) File 'C:\Python30\lib\urllib\request.py', line 359, in open response = self._open(req, data) File 'C:\Python30\lib\urllib\request.py', line 377, in _open '_open', req) File 'C:\Python30\lib\urllib\request.py', line 337, in _call_chain result = func(*args) File 'C:\Python30\lib\urllib\request.py', line 1082, in http_open return self.do_open(http.client.HTTPConnection, req) File 'C:\Python30\lib\urllib\request.py', line 1051, in do_open h = http_class(host, timeout=req.timeout) # will parse host:port File 'C:\Python30\lib\http\client.py', line 620, in __init__ self._set_hostport(host, port) File 'C:\Python30\lib\http\client.py', line 632, in _set_hostport raise InvalidURL('nonnumeric port: '%s'' % host[i+1:]) http.client.InvalidURL: nonnumeric port: 'password@server'
Apparently the url parsing in this ‘next gen url retrieval library’ doesn’t know what to do with username and passwords in the url.
What other choices do I have?
Direct from the Py3k docs: http://docs.python.org/dev/py3k/library/urllib.request.html#examples