I’m using scrapy to extract data from a web site. I’m saving the data to a mysql database using MysqlDB. The script works for English sites, but when I try it on a Swedish site I get:
self.db.query(insertion_query)
exceptions.UnicodeEncodeError: 'ascii' codec can't encode character u'\xe4' in position 156:
ordinal not in range(128)
I have put the following line at the top of each file involved in the scraping process to indicate the use of international charachters:
# –– coding: utf-8 ––
But I still get an error. What else do I need for python to accept non-english charachters? Here’s the full stack trace:
Traceback (most recent call last):
File "C:\Python27\lib\site-packages\scrapy-0.14.3-py2.7-win32.egg\scrapy\middleware.py",
line 60, in _process_
chain
return process_chain(self.methods[methodname], obj, *args)
File "C:\Python27\lib\site-packages\scrapy-0.14.3-py2.7-win32.egg\scrapy\utils\defer.py",
line 65, in process_
chain
d.callback(input)
File "C:\Python27\lib\site-packages\twisted\internet\defer.py", line 368, in callback
self._startRunCallbacks(result)
File "C:\Python27\lib\site-packages\twisted\internet\defer.py", line 464, in
_startRunCallbacks
self._runCallbacks()
--- <exception caught here> ---
File "C:\Python27\lib\site-packages\twisted\internet\defer.py", line 551, in _runCallbacks
current.result = callback(current.result, *args, **kw)
File "C:\Python27\tco\tco\pipelines.py", line 64, in process_item
self.db.query(insertion_query)
exceptions.UnicodeEncodeError: 'ascii' codec can't encode character u'\xe4' in position 156:
ordinal not in range(128)
This unicode issue look confusing at first, but it’s actually pretty easy.
If you write this on top of your source code, It means, python is going to
treat your code as utf-8, but not incoming or outgoing data.
You obviously want to write some data to your database, and this error happens
when some of your module encoding your utf-8 string (which is I guess swedish) to ascii.
That means, either MySQL was set as ascii or your mysql db driver is set as ascii.
So I suggest go check your mysql setting or driver setting.
This will make your mysql driver connect to mysql server using utf8