I’m working on a project that requires finding the most intersected set among a

Question

0

Asked: May 25, 20262026-05-25T12:45:34+00:00 2026-05-25T12:45:34+00:00

I’m working on a project that requires finding the most intersected set among a

0

I’m working on a project that requires finding the most intersected set among a great number of other sets.

That is, I have a large number (~300k) of sets with hundreds of entries each. Given one of the sets, I need to rank the other sets in order of how intersected they are. Additionally, the set entries contain properties which can be used as a filter, e.g. For set X, order the other sets by how much they intersect with the “green” entries subset.

I have free reign to architect this solution, and I’m looking for technology recommendations. I was initially thinking a relational DB would be the best suited, but I’m not sure how well it will perform doing these real time comparisons. Somebody recommended Lucene, but I’m not sure how well that would fit the bill.

I suppose it’s worth mentioning that new sets will be added regularly and that the sets may grow, but never shrink.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-25T12:45:34+00:00

Editorial Team

2026-05-25T12:45:34+00:00Added an answer on May 25, 2026 at 12:45 pm

I don’t know exactly what you are looking for: method, library, tool?

If you want to compute your large datasets really fast with distributed computing, you should check out MapReduce, e.g. using Hadoop on Amazon EC2/S3 services.

0

Reply
Share
Share

- Report

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I’m working on a project that requires finding the most intersected set among a

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply