I having problems when I am trying to parse emails through PHP imap functions. I want to extract the text of the body but without HTML links (like mailto:xxxx) or issues with the encode. I think that I have tried (almost, I guess) everything. But with the only code that I am approaching to the desired end result is with the next:
$bodyText = imap_fetchbody($inbox,$email_number,1.2);
if(!strlen($bodyText)>0)
{
$bodyText = imap_fetchbody($inbox,$email_number,1);
}
var_dump($bodyText);
The point is that with this I am getting issues with the final encoding (i think that this is the problem). So the var_dump output is something like:
> From: SomeOne <someone=40domain.com>
> To: Someone Else <someoneelse=40domain.com>
> =Date: lunes 23 de julio de 2012 13:04:43
> Subject: =46wd: =46W: URGE=21=21=21=21 Ley de Reforma del Congreso de 20=11
> =20
> Some text here, blah, blah, blah
> =20
> ---------- =46orwarded message ----------
> From: Whatever <whatever=40domain.com (mailto:whatever=40domain.com)>
> Date: 23 de julio de 2012 12:53
> Subject: =46wd: =46W: URGE=21=21=21=21 Ley de Reforma del Congreso de 20=11
> To: Someone <someone=40domain.com (mailto:someone=40domain=.com)>
> =20
> =20
> Some stuff=21=21
> =20
> ---------- =46orwarded message ----------
> =46rom: samuel l jackson <sanvuco=40domain.com (mailto:sanvuco=40domain.com)>
> Date: 2012/7/23
> Subject: =46W: URGE=21=21=21=21 Ley de Reforma del Congreso de 2011
> To: =22...Scary Monster=C2=B7=C2=B7=C2=B7 =C3=B2=5F=5F=C3=B3=22 <eowyn2=
6=40domain.com (mailto:eowyn26=40domain.com
I mean, there are appearing bad =40 instead of @ and the mailto:xxxx@domain.com blocks
Thanks you
Run the body string through
and you will get the printable, unencoded string you’re looking for. You can then continue with some regexp stuff to get rid of the mailtos and do any other parsing you like. An appropriate function for these tasks would be:
http://www.php.net/manual/en/function.preg-replace.php