I have two CSV files (three columns) which I need to compare and extract

Question

0

Asked: May 30, 20262026-05-30T15:51:24+00:00 2026-05-30T15:51:24+00:00

I have two CSV files (three columns) which I need to compare and extract

0

I have two CSV files (three columns) which I need to compare and extract rows from other file (five columns) that matches. The example for files are:

File1:

ATGCGCGACAGT, ch3, 123546
ATGCATACAGGATAT, ch2, 5141561615

……so on approx 100 entries

File2:

ATGCGGCGACAGT,ch3, 123456,mi141515, AUCAGCUAUAUAU, UACGCAGAUAUAUA
ATCAGACGATTATGA, ch4, 4564764, mi653453, AUCAGCAAUUUUCG, AUACAGACAAAAA

….so on approx 50000 entries

I need to match the column 1,2 and 3 for both the files in such a way that all three columns of file1 should match with file2. If so happens than extract 4,5 and 6 columns for further processing.

I was thinking of:

fhout=csv.writer(open('parsed_out', 'w'), delimiter=',')

for i in file1:

     a=[0]
     b=[1]
     c=[2]
      for x in file2:
       d=[0]
       e=[1]
       f=[2]
       g=[3]
       h=[4]
       i=[5]
         if a==d and b==e and c==f:
           fhout.writerow([g]+[h]+[i])
         else:
           pass

But somebody told me that I can use hashing or some better way rather writing such big loops for 10,000 or more entries in file1

Please suggest me better way to achieve this. Both file 1 and file 2 are parsed from more complex files.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-30T15:51:25+00:00

Try something like:

import csv

file_1_tuples = []

with open("file_1.csv") as fh:
    csv_reader = csv.reader ( fh )
    for row in csv_reader:
        file_1_tuples.append(  tuple(row)  )

with open("file_2.csv") as fh:
    csv_reader = csv.reader ( fh )
    for row in csv_reader:
        if tuple(row[0:3]) in file_1_tuples:
            print ( row[3:6] )

When run with the following data:

file_1.csv

person, john, smith
person, anne, frank
person, bob, macdonald
fruit, orange, banana
fruit, strawberry, fields
fruit, ringring, banana

file_2.csv

person, john, smith, 1, 2, 3
person, anne, frank, 4, 5, 6
person, bob, macdonald, 7, 8, 9

it produces the output

[' 1', ' 2', ' 3']
[' 4', ' 5', ' 6']
[' 7', ' 8', ' 9']

EDIT: A slightly nicer implementation using sets and list comprehensions:

import csv, pprint

with open("file_1.csv") as fh:
    csv_reader = csv.reader ( fh )
    file_1_tuples = { tuple(row) for row in csv_reader }

with open("file_2.csv") as fh:
    csv_reader = csv.reader ( fh )
    matched_rows = [ row for row in csv_reader if (tuple(row[:3]) in file_1_tuples)]

pprint.pprint (matched_rows)

EDIT 2: Note that this implementation is sensitive to the whitespace within the CSV file. If the spacing in your CSV file is inconsistent, use something like row = [element.strip(' ') for element in row] to strip out all the spaces.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I have two CSV files (three columns) which I need to compare and extract

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply