I’m trying to parse a formatted email that looks something like this:
From: Mr. Bob Simon Jones
Email: moo@cows.com
Comments: Lorem ipsum dolor sit amet, consectetur adipiscing elit. Mauris hendrerit, nibh a tristique malesuada, tellus nibh pharetra mauris, id tincidunt lacus turpis vel risus.
Vestibulum laoreet venenatis mauris sit amet suscipit. Cras vel pharetra nisl. Suspendisse venenatis ante quis tellus luctus id ornare sem pretium. Cras sodales tristique mauris sagittis ullamcorper.
Ut sit amet urna magna. Nullam et odio sit amet mauris tempus egestas. Donec eget risus nec lectus adipiscing convallis. Pellentesque in velit enim. Vestibulum ante ipsum primis in faucibus orci luctus et ultrices posuere cubilia Curae; Morbi quis ante diam. Etiam rhoncus leo vulputate ligula luctus volutpat. Praesent luctus, justo eget auctor viverra, diam turpis fringilla elit, non commodo massa arcu et eros. Cras elementum faucibus elit, sollicitudin luctus mi dictum a.
Address: First line, Second Line, Third line pe2 8pd, Fourth Line
Date of Visit: 25/06/2011
I’ve got a regular expression which works if that’s the only text present, but when theres a load of junk text after it goes stupidly slow, when running in a .NET app it doesn’t seem to complete at all.
The regular expression is:
.*From: (?<title>Mrs\.|Mr\.|Miss\.|Ms\.) (?<firstName>(\w| )*)(?<=. )(?<surname>(\w| )*)\s*
Email: (?<email>.*)\s*
Comments: (?<comments>(.|\s)*)\s*
Address: (?<address1>[^,]*), (?<address2>[^,]*), (?<address3>[^,]*),(?<address4>.*)\s*
Date of Visit: (?<dateOfVisit>\d\d/\d\d/\d\d\d\d).*
The first line finds every name bar the final one and uses that as the first name and the last name as the surname.
I assume its probably got something to do with this:
http://www.regular-expressions.info/catastrophic.html
But I can’t quite figure it out. Wondering if anyone might be able to point me in the right direction?
Thanks for your time
Here are some enhancements to regex, can you test it out ?
Use it with single line option