We’re writing a Python script to parse application logfiles.
Most of the logfiles will follow a similar format:
09:05:00.342344343 [DEBUG] [SOME_APPLICATION] [SOME_FUNCTION] Lorem ipsum dolor sic amet
We have a variety of regex expressions to parse the different sorts of loglines that come through, and strip out the relevant fields into Python regex groups (timestamp, log level, originating app/function, as well as fields in the payload).
I’ve stored each of these regexes in a dict:
foobar_patterns = {
'pattern1': re.compile(r'blahblahblah'),
'pattern2': re.compile(r'blahblahblahblah'),
}
However, there is obviously quite a fair bit of overlap between each pattern – the regex expressions to extract the timestamp, log level etc are shared.
Is there a way to remove this redundancy? Can you build up the difference regex strings somehow from a common template?
Extension – I’m looping through lines in the file, and then for each given line, looping through each compiled regex expression. Then based on that, there are different functions to handle each case – e.g. if we detect a certain type of message, we may need to search ahead three lines to find some ther line, and extract a field from that.
I was thinking of storing a function in the foobar_patterns dict as well, and then when we hit a match, executing on that.
Is that a Pythonic way to do things?
Cheers,
Victor
1 Answer