I’m iterating through a very large tab-delimited file (containing millions of lines) and pairing

Question

0

Asked: May 20, 20262026-05-20T22:11:28+00:00 2026-05-20T22:11:28+00:00

I’m iterating through a very large tab-delimited file (containing millions of lines) and pairing

0

I’m iterating through a very large tab-delimited file (containing millions of lines) and pairing different lines of it based on the value of some field in that file, e.g.

mydict = defaultdict()
for line in myfile:
  # Group all lines that have the same field into a list
  mydict[line.field].append(line)

Since “mydict” gets very large, I’d like to make it into an iterator so I don’t have to hold it all in memory. How can I make it so instead of populating a dictionary, I will create an iterator that I can loop through and get all these lists of lines that have the same field value?

Thanks.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-20T22:11:29+00:00

Editorial Team

2026-05-20T22:11:29+00:00Added an answer on May 20, 2026 at 10:11 pm

“millions of lines” is not very large unless the lines are long. If the lines are long you might save some memory by storing only positions in the file (.tell()/.seek()).

If the file is sorted by line.field; you could use itertools.groupby().

SQL’s GROUP BY might help for average-sized files (e.g., using sqlite as @wisty suggested).

For really large files you could use MapReduce.

0

Reply
Share
Share

- Report

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I’m iterating through a very large tab-delimited file (containing millions of lines) and pairing

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply