I have a huge text wall where I need to search for the born or death date, the date usually comes in the following formats:
some more text. Born December 5, 1942 in Sumner and more text
or
some text born in City, State, on August 8, 1922, more text
or
some text died Wednesday, November 3, 2010, more text
or
some text passed away Friday, December 19, 2008 more text
or
some text died January 11, 2007, more text
In short the date usually comes a few words after the born word.
I assume that the best way to get this date would be by using a regex but correct me if I am wrong here.
Here is what I came up with to get the date but I am still far from getting only the date:
(?=born\s|died\s|passed\saway\s)(\w+.*)(\w+\s\d+,\s\d+)
Problem is my regex doesn’t work entirely, it will eat the month word, how do I correct this or is there a better regex or way to do this ?
I know I could use the below to get the date only but I need to know the event as well:
(\w+\s[0-9]{1,2},\s[0-9]{2,4})
You could try using a lazy repeat:
(?=born\s|died\s|passed\saway\s)(\w+.*?)(\w+\s\d+,\s\d+)