Let’s say that I have a list of data: {1, 2, 3, 4, 5, 6, 7, 8, 9, 10} where n = 10 elements
I’d like to randomly choose k elements of this set to form a sublist, say k = 5.
In that case, I could end up with a sublist that looks like {9, 3, 5, 2, 7}
I could accomplish this by:
- Randomly determining an offset within the list, between 0 and the current size of the list minus 1
- Appending that element to my sublist
- Erasing that element from the original list
- Repeat until the desired size is found
The problem with this is that as the original list grows the offset and deletion time grows as well, and for any significantly large list (say over 1,000,000 elements), it takes quite a long time to perform this algorithm.
Is there a faster way to generate a random sequence from a list of given data? The implementation of the random number generator should be set aside for this problem, instead, focusing on how the RNG result is used in a proposed algorithm.
Any thoughts?
Right now I’m using the C++ STL list
Or you could accomplish this by:
the list, between 0 and the current
size of the list.
sublist.
I’m not sure why you want to delete the chosen elements from the main list, but if that is essential you could do it after constructing the sublist.
And I haven’t a clue about how the performance of this approach will rate against the performance of the of the suggested random_shuffle of a list of 10^6 elements.