I’m currently developing stochastic optimization algorithms and have encountered the following issue (which I imagine appears also in other places): It could be called totally unstable partial sort:
Given a container of size n and a comparator, such that entries may be equally valued.
Return the best k entries, but if values are equal, it should be (nearly) equally probable to receive any of them.
(output order is irrelevant to me, i.e. equal values completely among the best k need not be shuffled. To even have all equal values shuffled is however a related, interesting question and would suffice!)
A very (!) inefficient way would be to use shuffle_randomly and then partial_sort, but one actually only needs to shuffle the block of equally valued entries “at the selection border” (resp. all blocks of equally valued entries, both is much faster). Maybe that Observation is where to start…
I would very much prefer, if someone could provide a solution with STL algorithms (or at least to a large portion), both because they’re usually very fast, well encapsulated and OMP-parallelized.
Thanx in advance for any ideas!
If you really mean that output order is irrelevant, then you want
std::nth_element, rather thanstd::partial_sort, since it is generally somewhat faster. Note thatstd::nth_elementputs the nth element in the right position, so you can do the following, which is 100% standard algorithm invocations (warning: not tested very well; fencepost error possibilities abound):The function takes three iterators, like
nth_element, wherenthis an iterator to the nth element, which means that it isbegin() + (n - 1)).Edit: Note that this is different from most STL algorithms, in that it is effectively an inclusive range. In particular, it is UB if
nth == limit, since it is required that*nthbe valid. Furthermore, there is no way to request thebest 0elements, just as there is no way to ask for the 0th element withstd::nth_element. You might prefer it with a different interface; do feel free to do so.Or you might call it like this, after requiring that
0 < k <= n:It first uses
nth_elementto put the “best”kelements in positions0..k-1, guaranteeing that the kth element (or one of them, anyway) is at positionk-1. It then repartitions the elements preceding positionk-1so that the equal elements are at the end, and the elements following positionk-1so that the equal elements are at the beginning. Finally, it shuffles the equal elements.nth_elementisO(n); the twopartitionoperations sum up toO(n); andrandom_shuffleisO(r)whereris the number of equal elements shuffled. I think that all sums up toO(n)so it’s optimally scalable, but it may or may not be the fastest solution.Note: You should use
std::shuffleinstead ofstd::random_shuffle, passing a uniform random number generator through tobest_n. But I was too lazy to write all the boilerplate to do that and test it. Sorry.