I have two csv files: one is 98 mb and the other one is

Question

0

Asked: May 27, 20262026-05-27T22:06:02+00:00 2026-05-27T22:06:02+00:00

I have two csv files: one is 98 mb and the other one is

0

I have two csv files: one is 98 mb and the other one is 152 kb. the smaller file is a random subset of the bigger one, and I want to write a third file from the big csv such that the rows correspond to each line in the smaller csv file.

Big file (excerpt):

ZINC_ID MWT LogP    Desolv_apolar   Desolv_polar    HBD HBA tPSA    Charge  NRB SMILES
ZINC00000017    281.337 1.33    3.07    -19.2   2   6   87  0   4   CCC[S@](=O)c1ccc2c(c1)[nH]/c(=N/C(=O)OC)/[nH]2
ZINC00000036    151.141 0.37    3.51    -45.3   1   3   60  -1  2   c1ccc(cc1)[C@@H](C(=O)[O-])O
ZINC00000048    222.24  2.42    3.78    -8.68   0   4   37  0   4   COc1cc(c(c2c1OCO2)OC)CC=C
ZINC00000053    179.151 1.43    6.59    -56.84  0   4   66  -1  3   CC(=O)Oc1ccccc1C(=O)[O-]

Small File (excerpt):

SMILES
CCOc1ccc(cc1)NC(=O)C[C@@H](C)O
C[C@@H](c1ccc2c(c1)nc(o2)c3ccc(cc3)Cl)C(=O)[O-]
CC(=O)Oc1ccccc1C(=O)[O-]
COc1cc(c(c2c1OCO2)OC)CC=C

here is my code:

import csv

writer = csv.writer(open('/Users/Eric/Desktop/newZincSubset.csv','wb'))
count = 0
with open('/Users/Eric/Desktop/test700.csv','rU') as i:
    with open('/Users/Eric/Desktop/initial_data.csv','rU') as j:
        subject = csv.reader(i)
        reference = csv.reader(j)
        for row in subject:
            smiles = row[0]
            for reference_row in reference:
                suspect = reference_row[10]
                if (smiles == suspect):
                    writer.writerow(reference_row)

It seems to write the header just fine (ZINC_ID MWT LogP) just fine, but stops searching for every line. Is it a memory issue or is something wrong with my code?

Thanks!

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-27T22:06:03+00:00

The CSV readers can be iterated just once. After the first inner iteration is done, the underlying file object reaches the end of the file. Once you try to iterate over the reference reader for the second time there is nothing more to read.

I’d recommend that you first read the small file to a dictionary, and then iterate on the larger file searching for matches against the data in memory. You can also key the elements in the dictionary by what you will end up looking for (ref[10] I think), so there will be no need for nested loops.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I have two csv files: one is 98 mb and the other one is

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply