So essentially I am trying to read lines from multiple files in a directory

Question

0

Asked: June 1, 20262026-06-01T22:21:50+00:00 2026-06-01T22:21:50+00:00

So essentially I am trying to read lines from multiple files in a directory

0

So essentially I am trying to read lines from multiple files in a directory and using a regex to specifically find the beginnings of a sort of time stamp, I want to also place an instance of a list of months within the regex and then create a counter for each month based on how many times it appears. I have some code below, but it is still a work in progress. I know I closed off date_parse, but I that’s why I’m asking. And please leave another suggestion if you can think of a more efficient method. thanks.

months = ['Jan','Feb','Mar','Apr','May','Jun',\
          'Jul','Aug','Sep','Oct','Nov','  Dec']
date_parse = re.compile('[Date:\s]+[[A-Za-z]{3},]+[[0-9]{1,2}\s]')
counter=0
for line in sys.stdin:
    if data_parse.match(line):
        for month in months in line:
            print '%s %d' % (month, counter)

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-01T22:21:51+00:00

In a regular expression, you can have a list of alternative patterns, separated using vertical bars.

http://docs.python.org/library/re.html

from collections import defaultdict

date_parse = re.compile(r'Date:\s+(Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)')

c = defaultdict(int)

for line in sys.stdin:
    m = date_parse.match(line)
    if m is None:
        # pattern did not match
        # could handle error or log it here if desired
        continue # skip to handling next input line
    month = m.group(1)
    c[month] += 1

Some notes:

I recommend you use a raw string (with r'' or r"") for a pattern, so that backslashes will not become string escapes. For example, inside a normal string, \s is not an escape and you will get a backslash followed by an ‘s’, but \n is an escape and you will get a single character (a newline).
In a regular expression, when you enclose a series of characters in square brackets, you get a “character class” that matches any of the characters. So when you put [Date:\s]+ you would match Date: but you would also match taD:e or any other combination of those characters. It’s perfectly okay to just put in a string that should match itself, like Date:.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

So essentially I am trying to read lines from multiple files in a directory

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply