Can’t think of a neat way of doing this in java:
I’m streaming sets of strings from a file, line by line.
s1 s2 s3
s4 s5
s6 s7 s8 s9 s10
...
I load each line into a TreeSet, do some analysis and throw it away and move to the next line… I can fit the content of individual lines in memory, but not everything.
Now I want to maintain the top 5 biggest sets of strings I’ve encountered in the scan so far (storing nothing else).
I’m thinking a PriorityQueue with a SetSizeComparator, with add/poll when the queue reaches a size of 5. Anyone got a neater solution?
(I can’t brain today. I have the dumb…)
Create a tuple, say LineTuple, consisting of a line and its string frequency.
Have a min heap of LineTuples, with comparator as the comparison of the frequency values.
For first k lines, insert them into the heap.
From (k+1)st line onwards,
O( lg k )).O( lg k ))At any point of time, the
ktuples contained in the heap are thekbiggest lines.I am not fluent in Java, so I can’t provide any code sample. But, check here, here.