I have a std::list which I am currently randomizing using a Fisher-Yates shuffle (see http://en.wikipedia.org/wiki/Fisher-Yates_shuffle). To summarize, my code carries out the following steps on the list:
- Loop through each element of the
list. - Swap the element with a randomly chosen element from the current position onwards, including itself.
Because lists don’t provide random access, this means that I am iterating over the entire list in step 1, and for each element I’m iterating again, on average over half the remaining elements from that point onwards. This is a major bottleneck in my program’s performance, so I’m looking to improve it. For other reasons I need to continue using list as my container, but I’m considering converting to a vector at the start of my randomize function, and then converting back to list at the end. My lists typically contain 300 – 400 items, so I would guess that the cost of conversion between containers will be worth it to avoid traversing the items sequentially.
My question is: does this seem like the best way to go about optimizing the code? Is there a better way?
One easy improvement is to copy the data into a vector, shuffle the vector, and copy it back into a list. That is what was suggested in comments by Max and PeskyGnat:
This implementation is pretty fast. But, it will do three passes over the vector, and you can get it down to two passes by implementing the shuffle yourself:
Since the first version is much easier to understand and much less error-prone, it’s almost always preferable… unless perhaps this bit of code is critical for your performance (and you confirmed that with measurement.)
EDIT: By the way, since you are looking at the Wikipedia article, the second code sample uses the “inside-out” variant of Fisher-Yates.