I’m currently working on a problem that requires the random selection of an element from a set. Each of the elements has a weight(selection probability) associated with it.
My problem is that for sets with a small number of elements say 5-10, the complexity (running time) of the solution I was is acceptable, however as the number of elements increases say for 1K or 10K etc, the running time becomes unacceptable.
My current strategy is:
- Select random value X with range [0,1)
- Iterate elements summing their weights until the sum is greater than X
- The element which caused the sum to exceed X is chosen and returned
For large sets and a large number of selections this process begins to exhibit quadratic behavior, in short is there a faster way? a better algorithm perhaps?
Assuming that the element weights are fixed, you can work with precomputed sums. This is like working with the cumulative probability function directly, rather than the density function.
The lookup can then be implemented as a binary search, and hence be log(N) in the number of elements.
A binary search obviously requires random_access to the container of the weights.
Alternatively, use a
std::map<>and theupper_bound()method.