I’m looking for assistance creating a pattern match to ingest emails. The end goal is to recieve an incoming message and extract just the reply message, not all the trailing junk (previous threads, signature, datastamp header, etc…)
Here are the two same formats:
Format 1:
The Message is here, etc etc can span a random # of lines
On Nov 17, 2010, at 4:18 PM, Person Name wrote:
lots of junk down here which we don't want
Format 2:
The Message is here, etc etc can span a random # of lines
On Nov 17, 2010, at 4:18 PM, Site <yadaaaa+adad@sitename.com> wrote:
lots of junk down here which we don't want
Format 3:
The Message is here, etc etc can span a random # of lines
On Fri, Nov 19, 2010 at 1:57 AM, <customerserviceonline@pge.com> wrote:
lots of junk down here which we don't want
For both examples above, I’d like to create a pattern match that finds the first instance of the 2nd line. And then returns only whats above that line. I don’t want that delimiter line.
I can’t match on the date stamp, but I can match on everything after the comma as that’s in my control.
So the idea, Looks for either either of these two static items:
, Site <yadaaaa+adad@sitename.com> wrote:
, Person Name wrote:
And then take everything above that position. What do you think. Is this possible?
Well this would be a regexp solution :
You just provided one exemple so this might not be perfect but it should do the job quite well.
Then, you have to get the first captured group with $1 or [0] if you are using match 🙂
Btw, you can use the option
/ion the regex.