My table: CREATE TABLE `beer`.`matches` ( `id` int(10) unsigned NOT NULL AUTO_INCREMENT, `hashId` int(10)

Question

0

Asked: June 1, 20262026-06-01T02:11:34+00:00 2026-06-01T02:11:34+00:00

My table: CREATE TABLE `beer`.`matches` ( `id` int(10) unsigned NOT NULL AUTO_INCREMENT, `hashId` int(10)

0

My table:

CREATE TABLE `beer`.`matches` (
  `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `hashId` int(10) unsigned NOT NULL,
  `ruleId` int(10) unsigned NOT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB;

If a hash has matched a rule, there’s an entry in this table.

1) Count how many hashIds there are for each unique ruleId (AKA “how many hashes matched each rule”)

SELECT COUNT(*), ruleId FROM `beer`.`matches` GROUP BY ruleId ORDER BY COUNT(*)

2) Select the 10 best rules (ruleIds), that is, select the 10 rules that combined matches the greatest number of unique hashes. This means that a rule that matches a lot of hashes is not neccessarily a good rule, if another rule covers all the same hashes. Basically I want to select the 10 ruleIds that catches the most unique hashIds.

EDIT: Basically I have a sub-optimal solution in PHP/SQL here, but depending on the data it doesn’t necessarily give me the best answer to question 2). I’d be interested in a better solution. Read the comments for more information.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-01T02:11:35+00:00

If you really want to find the best solution (optimal solution), the problem is that you have to check all the possible combinations of 10 ruleIds, and find how many hashIds are returned by each of this possible combination. The problem is that the number of combinations is grossly the different number of ruleids ^ 10 (in fact, the number is smaller, if you consider that you cannot repeat the same ruleIds in the combinations… its a combination of m elements taken in groups of 10).

NOTE: To be exact, the number of possible combinations is

m!/(n!(m-n)!) => m!/(10!(m-10!)) where ! is factorial: m! = m * m-1 * m-2… * 3 * 2 * 1

To do this combinations you have to join your table with itself, 10 times, excluding the previous combinations of ruleids, somewhat like this:

select m1.ruleid r1, m2.ruleid r2, m3.ruleid r3 ...
from matches m1 inner join matches m2 on m2<>m1 
   inner join matches m3 on m3 <> m1 and m3 <> m2
     ...

Then you have to find the highest count of

select r1, r2, r3..., count(distinct hashid) 
from ("here the combinations of 10 ruleIds define above") G10
inner join M
  on ruleid = r1 or ruleid = r2 or ruleid=r3...
group by r1, r2, r3...

This gigantic query would take a lot of time to run.

There can be much faster procedures that will give you sub-optimal results.

SOME OPTIMIZATION:

This could be somewhat optimized, depending on the data shape, looking for groups which are equal to or included in other groups. This would require less than (m*(m+1))/2 operations, which compared to the other number, it’s a big deal, specially if it’s quite probable to find several groups which can be discarded, which will lower m. Anyway, the main has still a gigantic cost.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

My table: CREATE TABLE `beer`.`matches` ( `id` int(10) unsigned NOT NULL AUTO_INCREMENT, `hashId` int(10)

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply