I’ve been successful in retrieving the html code in regular webpages using python and the urllib2 module.
But when I try using it with a webpage that has a colon it doesn’t work.
This code:
f = urllib2.urlopen("http://http://gulasidorna.eniro.se/hitta:svenska+kyrkan/")
htmlcode = f.read()
print htmlcode
The following code generates this error message.
File "/Users/jonathan/Documents/Dropbox/Python/eniro.py", line 137, in <module>
f = urllib2.urlopen("http://http://gulasidorna.eniro.se/hitta:svenska+kyrkan/")
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 126, in urlopen
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 394, in open
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 412, in _open
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 372, in _call_chain
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 1199, in http_open
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 1140, in do_open
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/httplib.py", line 693, in _init_
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/httplib.py", line 718, in _set_hostport
httplib.InvalidURL: nonnumeric port: ''
This should work, you have an extra http:// in the start of the url: