I am trying to grab images in an embedded email. The problem is that the image that i save is unreadable and i can’t figure out why.
The email (saved as a file which i load in the beginning of the code):
MIME-Version: 1.0
Received: by 10.100.120.7 with HTTP; Tue, 18 Oct 2011 10:36:48 -0700 (PDT)
In-Reply-To: <8B4FDE07A4759840B84FD04B4C88100B010135E81D8C@fxildc03.forexmanage.com>
References: <8B4FDE07A4759840B84FD04B4C88100B010135E81D8C@fxildc03.forexmanage.com>
Date: Tue, 18 Oct 2011 19:36:48 +0200
Delivered-To: s.shpiz@gmail.com
Message-ID: <CAEb-As9XVmciajFAwEaFyF8CE4QG0t-Z5zFDDpMWXLqaBur1sA@mail.gmail.com>
Subject: openme
From: Simeon Shpiz <s.shpiz@gmail.com>
To: me <s.shpiz@gmail.com>
Content-Type: multipart/related; boundary=001636c5977303b92404af962ba6
--001636c5977303b92404af962ba6
Content-Type: multipart/alternative; boundary=001636c5977303b91d04af962ba5
--001636c5977303b91d04af962ba5
Content-Type: text/plain; charset=ISO-8859-1
****
--001636c5977303b91d04af962ba5
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable
<div dir=3D"ltr"><div class=3D"gmail_quote"><div lang=3D"EN-US" link=3D"blu=
e" vlink=3D"purple"><div><p class=3D"MsoNormal"><span style=3D"font-size:11=
.0pt;color:#1F497D"><img width=3D"15" height=3D"13" src=3D"cid:image003.png=
@01CC8DCD.30A2A7C0"></span><span style=3D"font-size:11.0pt;color:#1F497D"><=
u></u><u></u></span></p>
</div>
</div></div><br></div>
--001636c5977303b91d04af962ba5--
--001636c5977303b92404af962ba6
Content-Type: image/png; name="image003.png"
Content-Transfer-Encoding: base64
Content-ID: <image003.png@01CC8DCD.30A2A7C0>
X-Attachment-Id: 3e79c375acccec3d_0.1
iVBORw0KGgoAAAANSUhEUgAAAA4AAAANCAIAAAAWvsgoAAAAAXNSR0IArs4c6QAAAAlwSFlzAAAO
yAAADsMBrahYpwAAAItJREFUKFNj/P//PwNxgIk4ZWBVQFOBoBsMsGqrqqr6CgYsaNIPHz6EiMjJ
yb19+xbISE9PLy4uBjLQlSLrFBYWBnITExN9fHyADMJulZCQgOgnrFRUVJRYpXAnETb19evXxJr6
4sULiFJ8IfDt2zegii1btmRkZGBRKi8vjxbSwKjJysoCCjISnwYATtwwhahioZoAAAAASUVORK5C
YII=
--001636c5977303b92404af962ba6--
The python code i am using:
import email
from BeautifulSoup import BeautifulSoup
message = email.message_from_file(open(r'C:\shpiz\test\msg\12248'))
cid_list = []
images = []
for part in message.walk():
if str(part.get_content_type()) == 'text/html':
soup = BeautifulSoup(part.get_payload(decode=True))
cid = '<%s>'%soup('img')[0]['src'][4:]
cid_list.append(cid)
for part in message.walk():
if part.get('Content-ID') in cid_list :
images.append((part.get_filename(),part.get_payload(decode=True)))
for name, image in images:
with open(r'c:\shpiz\test\%s'%name,'w') as f:
f.write(image)
The image saved is unfortunately not good. (No program opens it).
I looked at the original and the new image files with notepad++ and there is a difference- there looks to be a line break my generated copy not present in the original. that’s not the only difference though as deleting the line in notepad++ didn’t make the image openable. The difference I described can be seen here
Would appreciate any assistance in finding the problem.
You’re writing the image in text mode and Python is mangling the line endings. Open it in
wbmode to write it verbatim.