I have a large file (100 million lines of tab separated values – about

Question

0

Asked: May 24, 20262026-05-24T18:02:19+00:00 2026-05-24T18:02:19+00:00

I have a large file (100 million lines of tab separated values – about

0

I have a large file (100 million lines of tab separated values – about 1.5GB in size). What is the fastest known way to sort this based on one of the fields?

I have tried hive. I would like to see if this can be done faster using python.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-24T18:02:22+00:00

Have you considered using the *nix sort program? in raw terms, it’ll probably be faster than most Python scripts.

Use -t $'\t' to specify that it’s tab-separated, -k n to specify the field, where n is the field number, and -o outputfile if you want to output the result to a new file.
Example:

sort -t $'\t' -k 4 -o sorted.txt input.txt

Will sort input.txt on its 4th field, and output the result to sorted.txt

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I have a large file (100 million lines of tab separated values – about

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply