I need to parse a large log file (flat file), which contains two column of values (column-A , column-B).
Values in both columns are repeating. I need to find for each unique value in column-A , I need to find a set of column-B values.
Is this can be done using unix shell command or need to write any perl or python script? What are the ways this can be done?
Example:
xxxA 2
xxxA 1
xxxB 2
XXXC 3
XXXA 3
xxxD 4
output:
xxxA - 2,1,3
xxxB - 2
xxxC - 3
xxxD - 4
I would use Python dictionaries where the dictionary keys are column A values and the dictionary values are Python’s built-in Set type holding column B values
The advantage of using a dictionary is that you’ll have a single dictionary key per column A value. The advantage of using a set is that you’ll have a unique set of column B values.
Efficiency notes:
a = lower(A)before the try catch statementTesting the code above on your input example yields: