This is an interview question. I have K machines each of which is connected to 1 central machine. Each of the K machines have an array of 4 byte numbers in file. You can use any data structure to load those numbers into memory on those machines and they fit. Numbers are not unique across K machines. Find the K largest numbers in the union of the numbers across all K machines. What is the fastest I can do this?
Share
Update: to make it clear, the combine step is not a sort. You just pick the top k numbers from the results. There are many ways to do this efficiently. You can use a heap for example, pushing the head of each list. Then you can remove the head from the heap and push the head from the list the element belonged to. Doing this k times gives you the result. All this is O(k*log(k)).