I am trying to parse an Gmail’s email. I am using Imap methods and so far so good.
My problem is with html emails. I searched everywhere for converting html body to plain text but nothing works for me so I am trying to do it myself. I am taking the html, clearing the all the attributes and now I have an encoding issue.
Some of my emails are in Hebrew and the Hebrew in the html looks like this :
=F0=E0 =F6=F8=E5 =E0=E9=FA=E9 =F7=F9=F8 =E1=E1=F7=F9=E4 =E1=E8=EC=F4=
=E5=EFI tried converting it from hex to string but the result wasn’t perfect. some words were missing.
How can I convert is to Hebrew chars?
Thanks a lot,
Elad
It seems you have some encoding issues with the HTML you receive.
You’re going to need to convert it to the correct encoding.
This works:
First part of your problem is that the
=F0=E0..are actually URLEncoded with a=instead of a%at the begining. So we replace the problematic characters and UrlDecode it.Afterwards, we convert it from the
Windows-1252encoding to theWindows-1255encoding.As a side note: there is a problem in the example string you gave:
=F4= =E5=EFshould actually be=F4 =E5=EF(the=character is always before, not after the decoded part)I tested it and it works fine on your string… בהצלחה