I have an array of elements. This array could be:
- Randomly shuffled (about 20% of the time)
- Nearly sorted* in ascending order (about 40% of the time)
- Nearly sorted in descending order (about 40% of the time)
But I do not know (in advance) which of these cases applies. I would prefer to sort the array into the order which it is already close to.
It does not matter whether the output is ascending or descending, but it must be one or the other (so I can perform a binary search on it.)
The sort need not be stable.
Some background info: The process goes roughly like this:
- Populate the array
- Sort on some attribute A
- Do some processing (compute quantiles, and some other minor stuff)
- Sort on some other attribute B
- Do more processing
- Sort on attribute C
- Do more processing
A and B are often correlated with each other (but may be positively or negatively.) Same applies to B and C. Occasionally A == C.
* “nearly sorted” here means most elements are close to their final positions. But rarely exactly at their final positions (there is a lot of additive noise, and not many long sorted subsequences.) Still, there are usually a few “outliers” at the start and end of the array which are poor predictors of the order for the next sort.
Is there an algorithm that can advantage of the fact that I have no preference for ascending vs. descending, to sort more cheaply (compared to the TimSort I am currently using?)
I’d continue using Timsort (however, a good alternative is Smoothsort*), but first probe the array to decide whether to sort in ascending or descending order. Look at the first and last elements and sort accordingly. If the array is unsorted, the choice is immaterial; if it is (partially) sorted, probing at a wide interval is more likely to correctly detect which way.
*Smoothsort has the same best, average, and worst case time as Timsort, and better space complexity. Like Timsort, it was specifically designed to take advantage of partially sorted data.