I am writing a tool that is able to backup and restore emails in Gmail via IMAP in python.
In some case the emails that are backed up from Gmail contain weird characters: ^@ that then cannot be reingested by Gmail IMAP.
Delivered-To: xxxxx@lxxxxxx
Received: by 1x.xx.xx.xx with SMTP id jjjjjjjj;
Tue, 14 Jun 2011 16:56:26 -0700 (PDT)
Received: by x.x.x.x with SMTP id xxxx.xxx;
Tue, 14 Jun 2011 16:56:16 -0700 (PDT)
Return-Path: <foo.bar@email.com>
Delivery-Date: Mon, 23 Aug 2010 17:58:56 +0200
Received: from xxxxx (xxxxx [x.x.x.x])
by xxxx (node=xxx) with ESMTP (xxx)
id xxx ; Mon, 23 Aug 2010 17:58:56 +0200
Received: from [x] (x)
by x (x) with x (x)
id x; Mon, 23 Aug 2010 17:58:50 +0200
Message-ID: <x@foo.com>
Date: Mon, 23 Aug 2010 17:58:48 +0200
From: Foo Bar <foo.bar@email.com>
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 6.1; de; rv:1.9.2.8) Gecko/20100802 Thunderbird/3.1.2
MIME-Version: 1.0
To: bar.foo@email.com <x>
Subject: The subject
Content-Type: text/plain; charset=ISO-8859-15; format=flowed
Content-Transfer-Encoding: 8bit
X-Provags-ID: xxxxxxxxxxx=
Envelope-To: foo.bar@email.com
Hello All,
blah blah blah
^@
At the end their is this special character. Sometimes it appears in other emails in the middle.
When I store the email on disk (eml format) I just save it and revive it.
The encoding seems correct.
What is this character ?
Am I doing something wrong when I store the email in eml.
A bit of guidance would be appreciated.
Thanks.
Short answer: You can strip null characters from the body of the email prior to sending them back to Google.**
Longer answer:
Old email (according to RFC 822) was allowed to have null characters. New email (according to RFC 2822, circa 2003) is not allowed to have null characters. Note RFC 2822 reads: “Differences from earlier standards… ASCII 0 (null) removed.”
It’s entirely possible that Gmail accepts 822-style emails via SMTP (that’s how the email first got to your inbox) but only 2822-style emails via IMAP (which is why you can’t put it back via IMAP).
** Note: Don’t blindly strip nulls from MIME documents included in the email. RFC 2822 “specifies that messages are made up of characters in the US-ASCII range of 1 through 127. There are other documents, specifically the MIME document series [RFC2045, RFC2046, RFC2047, RFC2048, RFC2049], that extend [RFC 2822] to allow for values outside of that range.”