I have 200k lists stored in a MySQL database. Given a list A, I

Question

0

Asked: June 18, 20262026-06-18T00:12:22+00:00 2026-06-18T00:12:22+00:00

I have 200k lists stored in a MySQL database. Given a list A, I

0

I have 200k lists stored in a MySQL database. Given a list A, I need to calculate a similarity score between A and each list X of the 200k lists. Assume the similarity metric is something simple such as the length of the set intersection of A and X.

Given the nature of pairwise comparison, I couldn’t think of a way to improve on O(N) for this, so improving the runtime means working with multiple CPU cores. Right now I have this task split across 4 cores using multithreading.Pool(), but it still takes nearly 10 minutes to complete. Worse, my computer shuts down to protect itself.

For anyone who’s dealt with this before, do you have an alternative method that you can share?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-18T00:12:23+00:00

Editorial Team

2026-06-18T00:12:23+00:00Added an answer on June 18, 2026 at 12:12 am

def bestmatch(A, lists):
     a = set(A)
     return min(lists, key=lambda x:  len(set(x) & a)

Using min does the looping a C speed. The lambda is a closure that references a quickly. The set(A) step is computed only once, rather than in the inner-loop.

0

Reply
Share
Share

- Report

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I have 200k lists stored in a MySQL database. Given a list A, I

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply