I am using C#
string content = " 4 marco bob 53 AUSTRIA (Jan. 13, 2012) – McDonald Janruary 15, 2021 July 15, 2923 June 2 2343 7/25/23 08/22/3323";
This should recognice all the dates except “4 marco bob 53” which is obviously not a datetime. However, my rules(below) match it(4 marco bob 53) and I cannot figure out how to avoid matching that(or similar examples).
I am trying to match the string above for all the date times. I wrote 3 rules to match some common date patterns.
eg:
Pattern f0: 5/2/2012
Pattern f2: 3 March 1900 or 3 Mar 1990 or 3 MAR. 1990 etc…
Pattern f3: Jan. 4, 2021 or January 4 2021, etc…
string f0 = "([0-9]{1,2})/([0-9]{1,2})/([0-9]{2,4})";
string f1 = "([0-9]{1,2})\\s+([jJ][aA][nN].*?|[fF][eE][bB].*?|[mM][aA][rR].*?|[aA][pP][rR].*?|[mM][aA][yY].*?|[jJ][uU][nN].*?|[jJ][uU][lL].*?|[aA][uU][gG].*?|[sS][eE][pP].*?|[oO][cC][tT].*?|[nN][oO][vV[.*?|[dD][eE][cC].*?)\\s+([0-9]{2,4})";
string f2 = "([jJ][aA][nN].*?|[fF][eE][bB].*?|[mM][aA][rR].*?|[aA][pP][rR].*?|[mM][aA][yY].*?|[jJ][uU][nN].*?|[jJ][uU][lL].*?|[aA][uU][gG].*?|[sS][eE][pP].*?|[oO][cC][tT].*?|[nN][oO][vV[.*?|[dD][eE][cC].*?)\\s+([0-9]{1,2})[\\s,]+([0-9]{2,4})";
I am new to regex, so I am sure I am doing some silly stuff(like not using case insensitive options etc), so let me know how i can improve this as well.
This is for learning regex, not learning how to use library functions….
Aggregated some of the answers posted to do what I wanted. This seems to be finding dates in free text reasonably well. Thanks to all posters.