I need to parse a large log file (flat file), which contains two column

Question

0

Asked: May 20, 20262026-05-20T00:12:13+00:00 2026-05-20T00:12:13+00:00

I need to parse a large log file (flat file), which contains two column

0

I need to parse a large log file (flat file), which contains two column of values (column-A , column-B).

Values in both columns are repeating. I need to find for each unique value in column-A , I need to find a set of column-B values.

Is this can be done using unix shell command or need to write any perl or python script? What are the ways this can be done?

Example:

xxxA 2
xxxA 1
xxxB 2
XXXC 3
XXXA 3
xxxD 4

output:

xxxA - 2,1,3
xxxB - 2
xxxC - 3
xxxD - 4

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-20T00:12:13+00:00

I would use Python dictionaries where the dictionary keys are column A values and the dictionary values are Python’s built-in Set type holding column B values

def parse_the_file():
    lower = str.lower
    split = str.split
    with open('f.txt') as f:
        d = {}
        lines = f.read().split('\n')
        for A,B in [split(l) for l in lines]:
            try:
                d[lower(A)].add(B)
            except KeyError:
                d[lower(A)] = set(B)

        for a in d:
            print "%s - %s" % (a,",".join(list(d[a])))

if __name__ == "__main__":
    parse_the_file()

The advantage of using a dictionary is that you’ll have a single dictionary key per column A value. The advantage of using a set is that you’ll have a unique set of column B values.

Efficiency notes:

The use of try-catch is more efficient than using an if\else statement to check for initial cases.
The evaluation and assignment of the str functions outside of the loop is more efficient than simply using them inside the loop.
Depending on the proportion of new A values vs. reappearance of A values throughout the file, you may consider using a = lower(A) before the try catch statement
I used a function, as accessing local variables is more efficient in Python than accessing global variables
Some of these performance tips are from here

Testing the code above on your input example yields:

xxxd - 4
xxxa - 1,3,2
xxxb - 2
xxxc - 3

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I need to parse a large log file (flat file), which contains two column

Example:

output:

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply