Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 8816959
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 14, 20262026-06-14T04:44:18+00:00 2026-06-14T04:44:18+00:00

Is there an efficient way to create an arbitrary long numpy array where each

  • 0

Is there an efficient way to create an arbitrary long numpy array where each dimension consists of n elements drawn from a list of length >= n? Each element in the list can be drawn only once for each dimension.

For instance, if I have the list l = ['cat', 'mescaline', 'popcorn'], I want to be able to, for instance by typing something like np.random.pick_random(l, (3, 2), replace=false), create an array array([['cat', 'popcorn'], ['cat', 'popcorn'], ['mescaline', 'cat']]).

Thank you.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-14T04:44:19+00:00Added an answer on June 14, 2026 at 4:44 am

    Theres a couple of ways of doing this, each has their pros/cons, the following four where just
    from the top of my head …

    • pythons own random.sample, is simple and built in, though it may not be the fastest…
    • numpy.random.permutation again simple but it creates a copy of which we have to slice, ouch!
    • numpy.random.shuffle is faster since it shuffles in place, but we still have to slice.
    • numpy.random.sample is the fastest but it only works on the interval 0 to 1 so we have
      to normalize it, and convert it to ints to get the random indices, at the end we
      still have to slice, note normalizing to the size we want does not generate a uniform random distribution.

    Here are some benchmarks.

    import timeit
    from matplotlib import pyplot as plt
    
    setup = \
    """
    import numpy
    import random
    
    number_of_members = 20
    values = range(50)
    """
    
    number_of_repetitions = 20
    array_sizes = (10, 200)
    
    python_random_times = [timeit.timeit(stmt = "[random.sample(values, number_of_members) for index in xrange({0})]".format(array_size),
                                         setup = setup,                      
                                         number = number_of_repetitions)
                                            for array_size in xrange(*array_sizes)]
    
    numpy_permutation_times = [timeit.timeit(stmt = "[numpy.random.permutation(values)[:number_of_members] for index in xrange({0})]".format(array_size),
                                   setup = setup,
                                   number = number_of_repetitions)
                                        for array_size in xrange(*array_sizes)]
    
    numpy_shuffle_times = [timeit.timeit(stmt = \
                                    """
                                    random_arrays = []
                                    for index in xrange({0}):
                                        numpy.random.shuffle(values)
                                        random_arrays.append(values[:number_of_members])
                                    """.format(array_size),
                                    setup = setup,
                                    number = number_of_repetitions)
                                         for array_size in xrange(*array_sizes)]                                                                    
    
    numpy_sample_times = [timeit.timeit(stmt = \
                                        """
                                        values = numpy.asarray(values)
                                        random_arrays = [values[indices][:number_of_members] 
                                                    for indices in (numpy.random.sample(({0}, len(values))) * len(values)).astype(int)]
                                        """.format(array_size),
                                        setup = setup,
                                        number = number_of_repetitions)
                                             for array_size in xrange(*array_sizes)]                                                                                                                                            
    
    line_0 = plt.plot(xrange(*array_sizes),
                                 python_random_times,
                                 color = 'black',
                                 label = 'random.sample')
    
    line_1 = plt.plot(xrange(*array_sizes),
             numpy_permutation_times,
             color = 'red',
             label = 'numpy.random.permutations'
             )
    
    line_2 = plt.plot(xrange(*array_sizes),
                        numpy_shuffle_times,
                        color = 'yellow',
                        label = 'numpy.shuffle')
    
    line_3 = plt.plot(xrange(*array_sizes),
                        numpy_sample_times,
                        color = 'green',
                        label = 'numpy.random.sample')
    
    plt.xlabel('Number of Arrays')
    plt.ylabel('Time in (s) for %i rep' % number_of_repetitions)
    plt.title('Different ways to sample.')
    plt.legend()
    
    plt.show()
    

    and the result:

    enter image description here

    So it looks like numpy.random.permutation is the worst, not surprising, pythons own random.sample is holding it own, so it looks like its a close race between numpy.random.shuffle and numpy.random.sample with numpy.random.sample edging out, so either should suffice, even though numpy.random.sample has a higher memory footprint I still prefer it since I really don’t need to build the arrays I just need the random indices …

    $ uname -a
    Darwin Kernel Version 10.8.0: Tue Jun  7 16:33:36 PDT 2011; root:xnu-1504.15.3~1/RELEASE_I386 i386
    
    $ python --version
    Python 2.6.1
    
    $ python -c "import numpy; print numpy.__version__"
    1.6.1
    

    UPDATE

    Unfortunately numpy.random.sample doesn’t draw unique elements from a population so you’ll get repitation, so just stick with shuffle is just as fast.

    UPDATE 2

    If you want to remain within numpy to leverage some of its built in functionality just convert the values into numpy arrays.

    import numpy as np
    values = ['cat', 'popcorn', 'mescaline']
    number_of_members = 2
    N = 1000000
    random_arrays = np.asarray([values] * N)
    _ = [np.random.shuffle(array) for array in random_arrays]
    subset = random_arrays[:, :number_of_members]
    

    Note that N here is quite large as such you are going to get repeated number of permutations, by permutations I mean order of values not repeated values within a permutation, since fundamentally theres a finite number of permutations on any giving finite set, if just calculating the whole set then its n!, if only selecting k elements its n!/(n – k)! and even if this wasn’t the case, meaning our set was much larger, we might still get repetitions depending on the random functions implementation, since shuffle/permutation/… and so on only work with the current set and have no idea of the population, this may or may not be acceptable, depends on what you are trying to achieve, if you want a set of unique permutations, then you are going to generate that set and subsample it.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

Is there a more efficient way to create a string array from Guava's Splitter
Is there a more efficient way to list files from a bucket in Amazon
Is there a clean and efficient way to create an arbitrary number of identical
Is there an efficient way to create a file with a given size in
Is there an efficient way to version store procedures written in PL/SQL? (I only
Is there an efficient way to clone an object yet leave out specified properties?
Is there are more efficient way than the following for selecting the third parent?
I'm curious if there is an efficient way to wait for the front page
If I have a standard for loop is there a more efficient way to
I'm wondering if there's a super-efficient way of confirming that an Image object references

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.