I have 200k lists stored in a MySQL database. Given a list A, I need to calculate a similarity score between A and each list X of the 200k lists. Assume the similarity metric is something simple such as the length of the set intersection of A and X.
Given the nature of pairwise comparison, I couldn’t think of a way to improve on O(N) for this, so improving the runtime means working with multiple CPU cores. Right now I have this task split across 4 cores using multithreading.Pool(), but it still takes nearly 10 minutes to complete. Worse, my computer shuts down to protect itself.
For anyone who’s dealt with this before, do you have an alternative method that you can share?
Using min does the looping a C speed. The lambda is a closure that references a quickly. The
set(A)step is computed only once, rather than in the inner-loop.