Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 9289823
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 18, 20262026-06-18T20:16:20+00:00 2026-06-18T20:16:20+00:00

I have a list of items from which I want to randomly sample a

  • 0

I have a list of items from which I want to randomly sample a subset, but each item is paired with a histogram over D bins and I want to sample the items in such a way that the summed histogram is approximately uniform.

Thus it should work as the sample function below:

>>> import numpy
>>> #The histograms from which to sample (each having 5 bins):
>>> data = numpy.random.randint(100, size=(10000,5))
>>> #The function which I'm trying to program:
>>> samples = sample(data,500)
>>> samples.shape
(500,5)
>>> summed_histogram = samples.sum(axis=0)
>>> #Each bin should have approximately equal value
>>> summed_histogram / float(summed_histogram.sum())
array([ 0.2,  0.2,  0.2,  0.2,  0.2])

The absolute values of the summed histogram are not important nor does it need to be exactly uniform, it just needs to be approximately uniform. Also, I don’t care if the returned sample size is not exactly the specified sample size. The sampling should be without replacement.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-18T20:16:22+00:00Added an answer on June 18, 2026 at 8:16 pm

    To expand on @Ilmari Karonen’s solution, what you want to do is compute weights for each histogram and then sample according to those weights. It appears to me that the most efficient way to do this, given your goal, would be with a linear program.

    Let D_ij be the weight of the jth bin in the histogram of the ith item. Then if each item is weighted with weight w_i, the “summed histogram” would have weight sum(i in items) w_i D_ij. One way to get your “approximately uniform” distribution would be to minimize the maximum difference across bins, so we would solve the following LP:

    minimize z
    subject to (for all j, k) 
        z >= (sum i in items) w_i D_ij - (sum i in items) w_i D_ik
        z >= (sum i in items) w_i D_ik - (sum i in items) w_i D_ij
    

    The above is basically saying that z >= absolute value of difference across all weighted pairs of bins. To solve this LP you will need a separate package since numpy does not include a LP solver. See this gist for a solution using cplex or this gist for a solution using cvxpy. Note that you will need to set some constraints on the weights (e.g. each weight is larger or equal to 0) as these solutions do. Other python bindings for GLPK (GNU Linear Programming kit) can be found here: http://en.wikibooks.org/wiki/GLPK/Python.

    Finally you just sample from histogram i with weight w_i. This can be done with an adaptation of roulette selection using cumsum and searchsorted as suggested by @Ilmari Karonen, see this gist.

    If you wanted the resulting weighted distribution to be “as uniform as possible”, I would solve a similar problem with weights, but maximize the weighted entropy across the weighted sum of bins. This problem would appear to be nonlinear although you could use any number of nonlinear solvers such as BFGS or gradient-based methods. This would probably be a bit slower than the LP method but it depends on what you need in your application. The LP method would approximate the nonlinear method very closely if you have a large number of histograms, because it would be easy to reach a uniform distribution.

    When using the LP solution, a bunch of the histogram weights may bind to 0 because the number of constraints is small, but this will not be a problem with a non-trivial number of bins, since the number of constraints is O(n^2).

    Example weights with 50 histograms and 10 bins:

    [0.006123642775837011, 0.08591660144140816, 0.0, 0.0, 0.0, 0.0, 0.03407525280610657, 0.0, 0.0, 0.0, 0.07092537493489116, 0.0, 0.0, 0.023926802333318554, 0.0, 0.03941537854267549, 0.0, 0.0, 0.0, 0.0, 0.10937063438351756, 0.08715770469631079, 0.0, 0.05841899435928017, 0.016328676622408153, 0.002218517959171183, 0.0, 0.0, 0.0, 0.08186919626269101, 0.03173286609277701, 0.08737065271898292, 0.0, 0.0, 0.041505225727435785, 0.05033635148761689, 0.0, 0.09172214842175723, 0.027548495513552738, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0259929997624099, 0.0, 0.0, 0.028044483157851748, 0.0, 0.0, 0.0]
    

    With 50 histograms each with 50 bins, now very few zero values:

    [0.0219136051655165, 0.0, 0.028325808078797768, 0.0, 0.040889043180965624, 0.04372501089775975, 0.0, 0.031032870504105477, 0.020745831040881676, 0.04794861828714149, 0.0, 0.03763592540998652, 0.0029093177405377577, 0.0034239051136138398, 0.0, 0.03079554151573207, 0.0, 0.04676278554085836, 0.0461258666541918, 9.639105313353352e-05, 0.0, 0.013649362063473166, 0.059168272186891635, 0.06703936360466661, 0.0, 0.0, 0.03175895249795131, 0.0, 0.0, 0.04376133487616099, 0.02406633433758186, 0.009724226721798858, 0.05058252335384487, 0.0, 0.0393763638188805, 0.05287112817101315, 0.0, 0.0, 0.06365320629437914, 0.0, 0.024978299494456246, 0.023531082497830605, 0.033406648550332804, 0.012693750980220679, 0.00274892002684083, 0.0, 0.0, 0.0, 0.0, 0.04465971034045478, 4.888224154453002]
    
    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

Say I have a list of items from a database. Next to each item
I have a method which is list my items from a server. Therefore, this
I have a big list of over 20000 items to be fetched from DB
I have Collection List<Car> . How to compare each item from this collection with
My goal is to have a list of items each item has an image
I have a List of Items which I retrieved from my Sqlite DB... I
I have a view controller(A) which shows a list of items download from server.
I have a list of strings, from which I want to locate every line
I have a generic list of objects from which I want to whittle down
I have a list of items which is fetched from the local database. Every

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.