I am attempting to do the following:
request = urllib2.Request(url=url, headers={ 'User-Agent' : 'Mozilla/4.0 (compatible; MSIE 5.5; Windows NT' })
response = urllib2.urlopen(request)
HTML_response = response.read()
response.close()
return BeautifulSoup(HTML_response)
however, on some pages (always the same pages, but it does not look like order is an issue) I get
Traceback (most recent call last):
File "/usr/lib/python2.7/multiprocessing/queues.py", line 268, in _feed
send(obj)
File "/usr/local/lib/python2.7/dist-packages/BeautifulSoup.py", line 439, in __getnewargs__
return (NavigableString.__str__(self),)
RuntimeError: maximum recursion depth exceeded while calling a Python object
which does exist, so I don’t think doing except urllib2.HTTPError: will help
Its working fine with
BeautifulSoup 3.2