I have multiple text files that represent logging entries which I need to parse later on. Each of the files is up to 1M in size and I have approximately 10 files.
Each line has the following format:
Timestamp\tData
I have to merge all files and sort the entries by the timestamp value. There is no guarantee that the entries of 1 file are in correct chronological order.
What would be the smartest approach? My Pseudo’d code looks like this:
List<FileEntry> oneBigList = new ArrayList<FileEntry>();
for each file {
parse each line into an instance of FileEntry;
add the instance to oneBigList;
}
Collections.sort(oneBigList according to FileEntry.getTimestamp());
If you are not sure that your task will fit into available memory, you are better off inserting your lines after parsing into a database table and have the database worry about how to order the data (an index on the timestamp column will help 🙂
If you are sure memory is no problem, I would use a
TreeMapto do the sorting while I add the lines to it.Make sure your FileEntry class implements
hashCode(),equals()andComparableaccording to your sort order.