I have written a script to parse an email. It works fine when receiving letters from Mac OS X Mail client (just this one tested so far), but my parser failes when letters contain unicode letters in their body part.
For example, I have sent a message with content ąčę.
And here is my part of script which parses body and attachments at the same time:
p = FeedParser()
p.feed(msg)
msg = p.close()
attachments = []
body = None
for part in msg.walk():
if part.get_content_type().startswith('multipart/'):
continue
try:
filename = part.get_filename()
except:
# unicode letters in filename, set default name then
filename = 'Mail attachment'
if part.get_content_type() == "text/plain" and not body:
body = part.get_payload(decode=True)
elif filename is not None:
content_type = part.get_content_type()
attachments.append(ContentFile(part.get_payload(decode=True), filename))
if body is None:
body = ''
Well, I mentioned that it works with letters from OS X Mail, but with Gmail letters it doesn’t.
The traceback:
Traceback (most recent call last):
File “/Users/aemdy/virtualenvs/django1.5/lib/python2.7/site-packages/django/core/handlers/base.py”, line 116, in get_response
response = callback(request, *callback_args, **callback_kwargs)
File “/Users/aemdy/virtualenvs/django1.5/lib/python2.7/site-packages/django/views/decorators/csrf.py”, line 77, in wrapped_view
return view_func(*args, **kwargs)
File “/Users/aemdy/virtualenvs/django1.5/lib/python2.7/site-packages/django/views/decorators/http.py”, line 41, in inner
return func(request, *args, **kwargs)
File “/Users/aemdy/PycharmProjects/rezervavau/bms/messages/views.py”, line 66, in accept
Message.accept(request.POST.get(‘msg’))
File “/Users/aemdy/PycharmProjects/rezervavau/bms/messages/models.py”, line 261, in accept
thread=thread
File “/Users/aemdy/virtualenvs/django1.5/lib/python2.7/site-packages/django/db/models/manager.py”, line 149, in create
return self.get_query_set().create(**kwargs)
File “/Users/aemdy/virtualenvs/django1.5/lib/python2.7/site-packages/django/db/models/query.py”, line 391, in create
obj.save(force_insert=True, using=self.db)
File “/Users/aemdy/virtualenvs/django1.5/lib/python2.7/site-packages/django/db/models/base.py”, line 532, in save
force_update=force_update, update_fields=update_fields)
File “/Users/aemdy/virtualenvs/django1.5/lib/python2.7/site-packages/django/db/models/base.py”, line 627, in save_base
result = manager._insert([self], fields=fields, return_id=update_pk, using=using, raw=raw)
File “/Users/aemdy/virtualenvs/django1.5/lib/python2.7/site-packages/django/db/models/manager.py”, line 215, in _insert
return insert_query(self.model, objs, fields, **kwargs)
File “/Users/aemdy/virtualenvs/django1.5/lib/python2.7/site-packages/django/db/models/query.py”, line 1633, in insert_query
return query.get_compiler(using=using).execute_sql(return_id)
File “/Users/aemdy/virtualenvs/django1.5/lib/python2.7/site-packages/django/db/models/sql/compiler.py”, line 920, in execute_sql
cursor.execute(sql, params)
File “/Users/aemdy/virtualenvs/django1.5/lib/python2.7/site-packages/django/db/backends/util.py”, line 47, in execute
sql = self.db.ops.last_executed_query(self.cursor, sql, params)
File “/Users/aemdy/virtualenvs/django1.5/lib/python2.7/site-packages/django/db/backends/postgresql_psycopg2/operations.py”, line 201, in last_executed_query
return cursor.query.decode(‘utf-8’)
File “/Users/aemdy/virtualenvs/django1.5/lib/python2.7/encodings/utf_8.py”, line 16, in decode
return codecs.utf_8_decode(input, errors, True)
UnicodeDecodeError: ‘utf8’ codec can’t decode byte 0xe0 in position 115: invalid continuation byte
My script gives me the following body ����. How can I decode it to get ąčę back?
Well, I found a solution myself. I will do some testing now and will let you guys now if anything fails.
I needed to decode the body again: