So in my Python script, I open up a text file containing dates of the format “January, 26, 1991”
Here is my regular expression:
pattern = """
(?:(September|April|June|November),\ (0?[1-9]|[12]\d|30),\ ((?:19|20)\d\d))#Months with 30 days
|(?:(January|March|May|July|August|October|December),\ (0?[1-9]|[12]\d|3[01]),\ ((?:19|20)\d\d))#Months with 31 days
|(?:February, (?:(?:(0?[1-9]|1\d|2[0-8]),\ ((?:19|20)\d\d))|(?:(29),\ ((?:(?:19|20)(?:04|08|12|16|20|24|28|32|36|40|44|48|52|56|60|64|68|72|76|80|84|88|92|96))|2000))))#February with 28 days or 29 with a leap year
"""
r = re.compile(pattern, re.VERBOSE)
This regular expression should match any real date including February 29 on leap years.
The problem I am having is figuring out a way to go through my opened text file and put all of the matched dates into a list. I’ve tried using .match, .search, .split and the other ones but I haven’t had any luck. Is there a way to put all matches into a list as a string so that I can easily compare the list to another and find all the dates that are in both lists? Basically I would like a list to come out looking like
[“January, 1, 1990”, “February, 29, 2012”, “December, 25, 1945”,….]
Also, please let me know if the regular expression I have is correct. I modified it from the answer to another question I had and I’m not sure whether I have it right since I’m not able to see whether the dates in my text file were matched or not.
You didn’t mention
re.findall()in the list of things you tried. That gives you a list of all regex matches.However, you need to use all non-capturing groups
(?:...), or you’ll get a list of all matched groups(...). Therefore, I suggestBut do you really need to validate the correctness of the dates? Are you expecting false dates like
February, 31, 2000to turn up in your data? If not, you could simplify your regex enormously. Or at least delegate date validation to a date parsing function which is better equipped for this task than a monstrous regex.For example:
matches nonsense like
January, 0, 1999orFebruary, 31, 2000, but would it really matter?