We know that, in general, the “smarter” comparison sorts on arbitrary data run in worst case complexity O(N * log(N)).
My question is what happens if we are asked not to sort a collection, but a stream of data. That is, values are given to us one by one with no indicator of what comes next (other than that the data is valid/in range). Intuitively, one might think that it is superior then to sort data as it comes in (like picking up a poker hand one by one) rather than gathering all of it and sorting later (sorting a poker hand after it’s dealt). Is this actually the case?
Gathering and sorting would be O(N + N * log(N)) = O(N * log(N)). However if we sort it as it comes in, it is O(N * K), where K = time to find the proper index + time to insert the element. This complicates things, since the value of K now depends on our choice of data structure. An array is superior in finding the index but wastes time inserting the element. A linked list can insert more easily but cannot binary search to find the index.
Is there a complete discussion on this issue? When should we use one method or another? Might there be a desirable in-between strategy of sorting every once in a while?
Balanced tree sort has
O(N log N)complexity and maintains the list in sorted order while elements are added.