Is there an efficient way to create an arbitrary long numpy array where each dimension consists of n elements drawn from a list of length >= n? Each element in the list can be drawn only once for each dimension.
For instance, if I have the list l = ['cat', 'mescaline', 'popcorn'], I want to be able to, for instance by typing something like np.random.pick_random(l, (3, 2), replace=false), create an array array([['cat', 'popcorn'], ['cat', 'popcorn'], ['mescaline', 'cat']]).
Thank you.
Theres a couple of ways of doing this, each has their pros/cons, the following four where just
from the top of my head …
random.sample, is simple and built in, though it may not be the fastest…numpy.random.permutationagain simple but it creates a copy of which we have to slice, ouch!numpy.random.shuffleis faster since it shuffles in place, but we still have to slice.numpy.random.sampleis the fastest but it only works on the interval 0 to 1 so we haveto normalize it, and convert it to ints to get the random indices, at the end we
still have to slice, note normalizing to the size we want does not generate a uniform random distribution.
Here are some benchmarks.
and the result:
So it looks like
numpy.random.permutationis the worst, not surprising, pythons ownrandom.sampleis holding it own, so it looks like its a close race betweennumpy.random.shuffleandnumpy.random.samplewithnumpy.random.sampleedging out, so either should suffice, even thoughnumpy.random.samplehas a higher memory footprint I still prefer it since I really don’t need to build the arrays I just need the random indices …UPDATE
Unfortunately
numpy.random.sampledoesn’t draw unique elements from a population so you’ll get repitation, so just stick with shuffle is just as fast.UPDATE 2
If you want to remain within numpy to leverage some of its built in functionality just convert the values into numpy arrays.
Note that N here is quite large as such you are going to get repeated number of permutations, by permutations I mean order of values not repeated values within a permutation, since fundamentally theres a finite number of permutations on any giving finite set, if just calculating the whole set then its n!, if only selecting k elements its n!/(n – k)! and even if this wasn’t the case, meaning our set was much larger, we might still get repetitions depending on the random functions implementation, since shuffle/permutation/… and so on only work with the current set and have no idea of the population, this may or may not be acceptable, depends on what you are trying to achieve, if you want a set of unique permutations, then you are going to generate that set and subsample it.