Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • Home
  • SEARCH
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 9204775
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 17, 20262026-06-17T23:47:25+00:00 2026-06-17T23:47:25+00:00

I wanted to create a python script to compare two log files (named foo.log

  • 0

I wanted to create a python script to compare two log files (named foo.log and bar.log) generated by a tool.
The log files have lines out of which some can be ignored and some that cannot be ignored (for comparison).

I have created regular expressions which identifies lines that can be ignored.
Following is how I implemented my code:

import re

p_1 = re.compile(...) # Pattern 1 to be ignored
p_2 = re.compile(...) # Pattern 2 to be ignored
p_3 = re.compile(...) # Pattern 3 to be ignored
...
p_n = re.compile(...) # Pattern n to be ignored

with open("foo.log", mode = 'r') as foo:
    with open("foo_temp.log", mode = 'w') as foo_temp:
        for foo_lines in foo:
            if p_1.match(foo_lines):
                continue
            elif p_2.match(foo_lines):
                continue
            elif p_3.match(foo_lines):
                continue
            ...
            ...
            ...
            elif p_n.match(foo_lines):
                continue
            else:
                foo_temp.write(foo_lines)

with open("bar.log", mode = 'r') as bar:
    with open("bar_temp.log", mode = 'w') as bar_temp:
        for bar_lines in bar:
            if p_1.match(bar_lines):
                continue
            elif p_2.match(bar_lines):
                continue
            elif p_3.match(bar_lines):
                continue
            ...
            ...
            ...
            elif p_n.match(bar_lines):
                continue
            else:
                bar_temp.write(bar_lines)

Once I have run the script, I get two files foo_temp.log and bar_temp.log which I later compare manually using WinMerge.
Following are my questions:

1) Is there any method by which I can optimize the way I am using the regular expressions ( I am actually new to regular expressions and I have a feeling that there is a lot that can be optimized in that respect)

2) Later on when I have to add new pattern to be ignored, can I make it easier from a user perspective to add new patterns. (Currently I need to add a new pattern, followed by checks in 2 locations- one for foo.log and one for bar.log)

3) I have heard of generators being used while dealing with files that are huge. While the logs that I do compare are relatively small (50MB max) should I look into how generators are used and maybe incorporating them in my script?

4) Rather than creating foo_temp.log and bar_temp.log and doing a manual check using WinMerge, is there a way by which I can do the comparisons (of the reduced files) in Python itself? Something from filecmp module perhaps?

Note: Some lines that can be ignored appear jumbled up (out of order) in both the logs: For eg in foo.log following might be the pattern:

Line 1: <<Pattern_1: Ignore>>
Line 2: <<Pattern_2: Do not Ignore>>
Line 3: <<Pattern_3: Do not Ignore>>
Line 4: <<Pattern_4: Do not Ignore>>
Line 5: <<Pattern_5: Ignore>>

Whereas bar.log may have the following pattern:

Line 1: <<Pattern_1: Ignore>>
Line 2: <<Pattern_5: Ignore>>
Line 3: <<Pattern_2: Do not Ignore>>
Line 4: <<Pattern_3: Do not Ignore>>
Line 5: <<Pattern_4: Do not Ignore>>

Note that once we remove the lines that can be ignored, the both the files will have the same pattern. For that reason, I decided to strip off all lines that can be ignored in one shot and then do a comparison on the reduced files.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-17T23:47:26+00:00Added an answer on June 17, 2026 at 11:47 pm

    I don’t know what sort of comparisons you had in mind — difflib may come in handy — but as near as I can tell, your code reduces to

    import re
    
    raw_patterns = [r'aaa', r'bbb', r'ccc']
    patterns = [re.compile(p) for p in raw_patterns]
    
    names = "foo", "bar"
    
    for name in names:
        with open(name + ".log", "r") as in_fp, open(name + "_temp.log", "w") as temp:
            for line in in_fp:
                if not any(patt.match(line) for patt in patterns):
                    temp.write(line)
    

    The principle is known as DRY, for “don’t-repeat-yourself”. Whenever you find yourself repeating code, you can consider abstracting the repetition away. In this case, we can put the patterns and the filenames in lists, and then iterate over those.

    —

    On the generator side, it’s true: you don’t need to construct an intermediate file. You can make an object which will simply yield only the lines you care about, and then iterate over that instead. For example:

    from itertools import zip_longest
    
    def informative_lines(filename):
        with open(filename) as infile:
            for line in infile:
                if not any(patt.match(line) for patt in patterns):
                    yield line
    
    paired_lines = zip_longest(informative_lines("foo.log"),
                               informative_lines("bar.log"))
    
    for i, (line0, line1) in enumerate(paired_lines):
        if line0 != line1:
            print('mismatch at non-ignored line #', i)
            print(line0)
            print(line1)
            raise Exception("problem!")
    
    print("hooray, files matched!")
    
    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

This is my first script, and I am trying to compare two genome files,
I wanted to create one js file which includes every js files to attach
I wanted to learn how to create python packages, so I visited http://docs.python.org/distutils/index.html .
Say I wanted to create an array (NOT list) of 1,000,000 twos in python,
I wanted to create a redis cache in python, and as any self respecting
I am trying to create an enumeration in python. I have seen seen several
I wanted to create my own Python exception class, like this: class MyException(BaseException): def
I am looking to create a simple nested lookup mechanism in python, and wanted
I wanted to create smth similar to this one Kansas county map where user
I wanted to create a new property on a table in my model.. Basically

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.