I’m trying to write application that periodically receives e-mails. It writes every mail into database. But sometimes i’m getting ‘Re:’ e-mail that looks something like this:
New message
On September 21, 2010 24:26 Someone wrote (a):
| Old message
|
The format depends on e-mail provider.
Is there any library that helps removing ‘Re’ part from e-mail message? Maybe IMAP server can do that? I have all the previous e-mails from thread in database so I can take them and search in new message.
Personally I think that you are out of luck here, as the message copy is part of the body. So in order to remove it you will have to process the message’s body and write an extraction method for each known format (obviously the problem is that you cannot know all possible formats).
So, instead of parsing the body why don’t you persist the whole message into the database? Normally the size of the message should not be the problem with modern DBMS. If it really is a problem you always can compress the body and store it in a BLOB.