I have a sorted set (std::set to be precise) that contains elements with an assigned weight. I want to randomly choose N elements from this set, while the elements with higher weight should have a bigger probability of being chosen. Any element can be chosen multiple times.
I want to do this as efficiently as possible – I want to avoid any copying of the set (it might get very large) and run at O(N) time if it is possible. I’m using C++ and would like to stick to a STL + Boost only solution.
Does anybody know if there is a function in STL/Boost that performs this task? If not, how to implement one?
You need to calculate (and possibly cache, if you think of performance) the sum of all weights in your set. Then, generate N random numbers ranging up to this value. Finally, iterate your set, counting the sum of the weights you encountered so far. Inspect all the (remaining) random numbers. If the number falls between the previous and the next value of the sum, insert the value from the set and remove your random number. Stop when your list of random numbers is empty or you’ve reached the end of the set.