Points:
- We process thousands of flat files in a day, concurrently.
- Memory constraint is a major issue.
- We use thread for each file process.
- We don’t sort by columns. Each line (record) in the file is treated as one column.
Can’t Do:
- We cannot use unix/linux’s sort commands.
- We cannot use any database system no matter how light they can be.
Now, we cannot just load everything in a collection and use the sort mechanism. It will eat up all the memory and the program is gonna get a heap error.
In that situation, how would you sort the records/lines in a file?
It looks like what you are looking for is
external sorting.
Basically, you sort small chunks of data first, write it back to the disk and then iterate over those to sort all.