Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 8958723
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 15, 20262026-06-15T15:16:48+00:00 2026-06-15T15:16:48+00:00

I’m looking for a very efficient way to calculate all possible combinations from n

  • 0

I’m looking for a very efficient way to calculate all possible combinations from n lists and then keep the combination with the smallest least squares difference.

I already have a code that does it, but when it gets to several million combinations things get slow.

candidates_len contains a list of lists with lengths i.e. [[500, 490, 510, 600][300, 490, 520][305, 497, 515]]
candidates_name contains a list of lists with names i.e. [[‘a’, ‘b’, ‘c’, ‘d’][‘mi’, ‘mu’, ‘ma’][‘pi’, ‘pu’, ‘pa’]]

Both lists have n lists.

#    Creating the possible combinations and store the lists of lengths in vector r
r=[[]]
for x in candidates_len:
    r = [ i + [y] for y in x for i in r ]
#Storing the names of the combinations and store the lists of identifiers in vector z
z=[[]]
for x in candidates_name:
    z = [ i + [y] for y in x for i in z ]          
#Calculating distances and storing the minimum one
min_index = 0
min_dist = 0
list_best = []
for index, item in enumerate(r):
    n = 0
    dist = 0
    while n < len(candidates_len):
        for i in range(n,len(candidates_len)):
            dist = dist + (item[n]-item[i])**2
        n=n+1
    if index==0:
            min_dist = dist
            min_index = index
            list_best.append(item)
    elif dist < min_dist:
        min_dist = dist
        min_index = index
        list_best = []
        list_best.append(z[index])
least_combination = min_index

A hard case:
http://pastebin.com/BkVQTQWK

Here are some test times. Would be good to get under a minute or so. I don’t know if it’s possible though.

combinations time(s)
77760   1.255663
41184   1.580333
69120   6.214786
8960   1.131834
537600  14.855361
89100   1.264126
16384   3.247404
4199040 666.853284
226800   3.935878
9289728 679.064149
  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-15T15:16:49+00:00Added an answer on June 15, 2026 at 3:16 pm

    My first thought here is that you’re spending an awful lot of time building up lists that you don’t need. At the very least, scrapping them would make things a lot simpler, but no guarantees it would actually make things faster:

    r = itertools.product(*candidates_len)
    z = itertools.product(*candidates_name)
    
    min_dist = None
    for item, names in itertools.izip(r, z):
      dist = 0
      for n in range(len(item)):
        for i in range(n, len(item)):
          dist += (item[n]-item[i])**2
      if min_dist is None or dist < min_dist:
        min_dist = dist
        best = item, names
    
    print(best)
    

    With your test data, the explicit lists are taking up gigabytes of memory. I’m not sure how much—my poor 4GB laptop went into swap thrash hell before it even finished generating the z list, and everything slowed to a crawl. The whole operation takes less time with itertools than just the setup part without it… That probably isn’t true on a machine with, say, 16GB of RAM, but still, why use the memory if you don’t need it?

    My next thought is that all you’re doing is calculating the LSD on a bunch of arrays. Do you have a huge array of tiny arrays? If so, can you de-jagged them (e.g., filling them out with None) and numpy the whole thing? On the other hand, if it’s an array of big arrays, you probably want a list (or, as above, an iterator) of numpy arrays, so at least you can vectorize one dimension.

    Either way, vectorization is the key to optimizing anything that involves simple operations over big arrays, and numpy will often do vectorization better than anything an expert C++-and-Fortran-and-platform-specific-assembly coder is likely to do manually.

    Without either thinking through the code or trying to understand the algorithm at a deep level, my first attempt would be to generate r as a sequence (as in my code above) but of numpy row vectors (something like matrix(x, dtype=int) for x in itertools.product(*candidates_len)). Then you can calculate the differences on each item by just item - item.T, then sum up the squares of the lower triangle (which I’d have to look up to figure out how to do). You could probably then further improve the performance by figuring out a way to only calculate the lower triangle in the first place. The typical trick to this is figuring out how to get the lower-triangle sum into the diagonal as part of the vectorized operation, and then you just extract the diagonal, but that isn’t always appropriate. See the broadcasting docs for some ideas on how to vectorize the inner loop without creating explicit matrices. Finally, see if there’s a way you can create a 3D array out of the whole thing (which may mean padding the individual items out to a fixed width) and then vectorizing the whole operation. (The memory use won’t be nearly as bad, as numpy only needs to allocate 4 bytes for each value instead of a whole PyObject… but it may still be bad enough that you lose more than you gain.) Sorry if this is all a little vague, but hopefully it’s enough to get you started on the experimentation.

    Another thought is that you can probably parallelize this. Any machine with enough memory to handle that massive list, I’m willing to bet it has at least 4 cores on it. And you’ve got a long sequence of completely independent operations, which is the easiest thing in the world to parallelize. As a first step, create a multiprocessing.Pool, and make the outer loop submit jobs to the pool instead of doing the work directly. You’ll probably find that the jobs are too small, so you’re drowning in overhead, but then you can always batch up each N items (either explicitly, or see the grouper recipe in the itertools docs), and make the job be “loop over these N items, and return the one with the min LSD”. (It may take some tweaking to find the optimal N.) You can even do this together with the top-level numpy, by splitting the giant array up into chunks along the x axis and farming those out as jobs.

    One more thought: Your algorithm is starting with a product of N*M, each element of which is of length N. Then, for each element, you loop over it twice. So, the best possible performance is going to be O(N^3*M). Is this really the right algorithm? If so, are you actually getting N^3*M performance from your algorithm? If the answer to either question is no, you shouldn’t be trying to micro-optimize it. Only when you’ve actually got the most efficient algorithm, coded properly, is it worth doing things like vectorizing, avoiding excess work, moving tight loops into C++ and Fortran, etc. Otherwise, you’re just going to come back and say “But it still blows up when I go to 4x as big as my last test run”.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I have a text area in my form which accepts all possible characters from
I have a string like this: La Torre Eiffel paragonata all&#8217;Everest What PHP function
Let's say I'm outputting a post title and in our database, it's Hello Y&#8217;all
I have a jquery bug and I've been looking for hours now, I can't
link Im having trouble converting the html entites into html characters, (&# 8217;) i
For some reason, after submitting a string like this Jack’s Spindle from a text
I am currently running into a problem where an element is coming back from
I'm parsing an RSS feed that has an &#8217; in it. SimpleXML turns this
I'm trying to convert HTML to plain text. I get many &\#8217; &\#8220; etc.
Does anyone know how can I replace this 2 symbol below from the string

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.