I wanted to create a python script to compare two log files (named foo.log

Question

0

Asked: June 17, 20262026-06-17T23:47:25+00:00 2026-06-17T23:47:25+00:00

I wanted to create a python script to compare two log files (named foo.log

0

I wanted to create a python script to compare two log files (named foo.log and bar.log) generated by a tool.
The log files have lines out of which some can be ignored and some that cannot be ignored (for comparison).

I have created regular expressions which identifies lines that can be ignored.
Following is how I implemented my code:

import re

p_1 = re.compile(...) # Pattern 1 to be ignored
p_2 = re.compile(...) # Pattern 2 to be ignored
p_3 = re.compile(...) # Pattern 3 to be ignored
...
p_n = re.compile(...) # Pattern n to be ignored

with open("foo.log", mode = 'r') as foo:
    with open("foo_temp.log", mode = 'w') as foo_temp:
        for foo_lines in foo:
            if p_1.match(foo_lines):
                continue
            elif p_2.match(foo_lines):
                continue
            elif p_3.match(foo_lines):
                continue
            ...
            ...
            ...
            elif p_n.match(foo_lines):
                continue
            else:
                foo_temp.write(foo_lines)

with open("bar.log", mode = 'r') as bar:
    with open("bar_temp.log", mode = 'w') as bar_temp:
        for bar_lines in bar:
            if p_1.match(bar_lines):
                continue
            elif p_2.match(bar_lines):
                continue
            elif p_3.match(bar_lines):
                continue
            ...
            ...
            ...
            elif p_n.match(bar_lines):
                continue
            else:
                bar_temp.write(bar_lines)

Once I have run the script, I get two files foo_temp.log and bar_temp.log which I later compare manually using WinMerge.
Following are my questions:

1) Is there any method by which I can optimize the way I am using the regular expressions ( I am actually new to regular expressions and I have a feeling that there is a lot that can be optimized in that respect)

2) Later on when I have to add new pattern to be ignored, can I make it easier from a user perspective to add new patterns. (Currently I need to add a new pattern, followed by checks in 2 locations- one for foo.log and one for bar.log)

3) I have heard of generators being used while dealing with files that are huge. While the logs that I do compare are relatively small (50MB max) should I look into how generators are used and maybe incorporating them in my script?

4) Rather than creating foo_temp.log and bar_temp.log and doing a manual check using WinMerge, is there a way by which I can do the comparisons (of the reduced files) in Python itself? Something from filecmp module perhaps?

Note: Some lines that can be ignored appear jumbled up (out of order) in both the logs: For eg in foo.log following might be the pattern:

Line 1: <<Pattern_1: Ignore>>
Line 2: <<Pattern_2: Do not Ignore>>
Line 3: <<Pattern_3: Do not Ignore>>
Line 4: <<Pattern_4: Do not Ignore>>
Line 5: <<Pattern_5: Ignore>>

Whereas bar.log may have the following pattern:

Line 1: <<Pattern_1: Ignore>>
Line 2: <<Pattern_5: Ignore>>
Line 3: <<Pattern_2: Do not Ignore>>
Line 4: <<Pattern_3: Do not Ignore>>
Line 5: <<Pattern_4: Do not Ignore>>

Note that once we remove the lines that can be ignored, the both the files will have the same pattern. For that reason, I decided to strip off all lines that can be ignored in one shot and then do a comparison on the reduced files.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-17T23:47:26+00:00

I don’t know what sort of comparisons you had in mind — difflib may come in handy — but as near as I can tell, your code reduces to

import re

raw_patterns = [r'aaa', r'bbb', r'ccc']
patterns = [re.compile(p) for p in raw_patterns]

names = "foo", "bar"

for name in names:
    with open(name + ".log", "r") as in_fp, open(name + "_temp.log", "w") as temp:
        for line in in_fp:
            if not any(patt.match(line) for patt in patterns):
                temp.write(line)

The principle is known as DRY, for “don’t-repeat-yourself”. Whenever you find yourself repeating code, you can consider abstracting the repetition away. In this case, we can put the patterns and the filenames in lists, and then iterate over those.

—

On the generator side, it’s true: you don’t need to construct an intermediate file. You can make an object which will simply yield only the lines you care about, and then iterate over that instead. For example:

from itertools import zip_longest

def informative_lines(filename):
    with open(filename) as infile:
        for line in infile:
            if not any(patt.match(line) for patt in patterns):
                yield line

paired_lines = zip_longest(informative_lines("foo.log"),
                           informative_lines("bar.log"))

for i, (line0, line1) in enumerate(paired_lines):
    if line0 != line1:
        print('mismatch at non-ignored line #', i)
        print(line0)
        print(line1)
        raise Exception("problem!")

print("hooray, files matched!")

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I wanted to create a python script to compare two log files (named foo.log

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply