Give k sorted inverted lists, I want an efficient algorithm to get the union of these k lists?
Each inverted list is a read-only array in memory, each list contains integer in sorted order.
the result will be saved in a predefined array which is large enough. Is there any algorithm better than k-way merge?
Give k sorted inverted lists, I want an efficient algorithm to get the union
Share
K-Way merge is optimal. It has
O(log(k)*n)ops [wherenis the number of elements in all lists combined].It is easy to see it cannot be done better – as @jpalecek mentioned, otherwise you could sort any array better then
O(nlogn)by splitting it into chunks [inverted indexes] of size 1.[resulting array] will be sorted. This assumption is true for most
applications that use inverted indexes, especially in the
Information-Retrieval area. This feature [sorted indexes] allows
elegant and quick intersection of indexes.
make sure that if an element is appearing in two lists, it will be
added only once [easy to do it by simply checking the last element in
the target array before adding].