I have a set of objects in a Vector from which I’d like to select a random subset (e.g. 100 items coming back; pick 5 randomly). In my first (very hasty) pass I did an extremely simple and perhaps overly clever solution:
Vector itemsVector = getItems(); Collections.shuffle(itemsVector); itemsVector.setSize(5);
While this has the advantage of being nice and simple, I suspect it’s not going to scale very well, i.e. Collections.shuffle() must be O(n) at least. My less clever alternative is
Vector itemsVector = getItems(); Random rand = new Random(System.currentTimeMillis()); // would make this static to the class List subsetList = new ArrayList(5); for (int i = 0; i < 5; i++) { // be sure to use Vector.remove() or you may get the same item twice subsetList.add(itemsVector.remove(rand.nextInt(itemsVector.size()))); }
Any suggestions on better ways to draw out a random subset from a Collection?
Jon Bentley discusses this in either ‘Programming Pearls’ or ‘More Programming Pearls’. You need to be careful with your N of M selection process, but I think the code shown works correctly. Rather than randomly shuffle all the items, you can do the random shuffle only shuffling the first N positions – which is a useful saving when N << M.
Knuth also discusses these algorithms – I believe that would be Vol 3 ‘Sorting and Searching’, but my set is packed pending a move of house so I can’t formally check that.