I’m writing something to ‘clean’ a URL. In this case all I’m trying to do is return a faked scheme as urlopen won’t work without one. However, if I test this with www.python.org It’ll return http:///www.python.org. Does anyone know why the extra /, and is there a way to return this without it?
def FixScheme(website):
from urlparse import urlparse, urlunparse
scheme, netloc, path, params, query, fragment = urlparse(website)
if scheme == '':
return urlunparse(('http', netloc, path, params, query, fragment))
else:
return website
Problem is that in parsing the very incomplete URL
www.python.org, the string you give is actually taken as thepathcomponent of the URL, with thenetloc(network location) one being empty as well as the scheme. For defaulting the scheme you can actually pass a second parameterschemetourlparse(simplifying your logic) but that does’t help with the “empty netloc” problem. So you need some logic for that case, e.g.