I want to write a regular expression to filter out all junk out of an email that is being pulled in through imaplib and email modules in my Python script below. I’m thinking a regex is best but feel free to suggest better solutions. Any idea why the email text has a equals in the word be=tter below? The original email has it as better.
Python snippet:
emailMessage = email.message_from_string
print emailMessage.get_payload():
Print Text:
>=20
> >>>>
> >>>> Hope this makes it through you spam filter but couldn't think of a be=
tter subject.
> >>>>
As Karl Knechtel says in the comments, your message is encoded as quoted-printable. To decode that, use
quopri.decodestring():Using regexes to strip out the “junk” characters is going to be inefficient, and also means that whenever a new one turns up in your input down the line, you’ll have to modify your code.
However, if after decoding you want to lose the
>characters [and any whitespace betwwen them] at the beginning of each line, then for that, a regex is a reasonable solution:(?m)indicates that the regex is multiline, by the way.