So essentially I am trying to read lines from multiple files in a directory and using a regex to specifically find the beginnings of a sort of time stamp, I want to also place an instance of a list of months within the regex and then create a counter for each month based on how many times it appears. I have some code below, but it is still a work in progress. I know I closed off date_parse, but I that’s why I’m asking. And please leave another suggestion if you can think of a more efficient method. thanks.
months = ['Jan','Feb','Mar','Apr','May','Jun',\
'Jul','Aug','Sep','Oct','Nov',' Dec']
date_parse = re.compile('[Date:\s]+[[A-Za-z]{3},]+[[0-9]{1,2}\s]')
counter=0
for line in sys.stdin:
if data_parse.match(line):
for month in months in line:
print '%s %d' % (month, counter)
In a regular expression, you can have a list of alternative patterns, separated using vertical bars.
http://docs.python.org/library/re.html
Some notes:
I recommend you use a raw string (with
r''orr"") for a pattern, so that backslashes will not become string escapes. For example, inside a normal string,\sis not an escape and you will get a backslash followed by an ‘s’, but\nis an escape and you will get a single character (a newline).In a regular expression, when you enclose a series of characters in square brackets, you get a “character class” that matches any of the characters. So when you put
[Date:\s]+you would matchDate:but you would also matchtaD:eor any other combination of those characters. It’s perfectly okay to just put in a string that should match itself, likeDate:.