_mysql_exceptions.Warning: Incorrect string value: '\xE7\xB9\x81\xE9\xAB\x94...' for column 'html' at row 1
def getSource(theurl, moved = 0):
if moved == 1:
theurl = urllib2.urlopen(theurl).geturl()
urlReq = urllib2.Request(theurl)
urlReq.add_header('User-Agent',random.choice(agents))
urlResponse = urllib2.urlopen(urlReq)
htmlSource = urlResponse.read()
return htmlSource
new_u = Url(source_url = source_url, source_url_short = source_url_short, source_url_hash = source_url_hash, html = htmlSource)
new_u.save()
Why is this happening?
I am basically downloading URL of a page…and then saving it to a database using Django.
It only happens sometimes….and sometimes it works fine.
Edit: it seems like I have to set the database to UTF-8? What is the command to do that?
You basically need to ensure proper a string encoding. E.g. the string you provide to django is not UTF-8 encoded and therefore some characters can’t be resolved.
Some helpful advice on how to find the encoding of the requested page can be found here: urllib2 read to Unicode