Suppose you have a table RULES with 3 columns A, B, and C. As data enters the system, I want to know if any row of the RULES table matches my data with the condition that if the corresponding column in the RULES table is null, all data matches. The obvious SQL is:
SELECT * FROM RULES WHERE (A = :a OR A IS NULL) AND (B = :b OR B IS NULL) AND (C = :c OR C IS NULL)
So if I have rules:
RULE A B C 1 50 NULL NULL 2 51 xyz NULL 3 51 NULL 123 4 NULL xyz 456
An input of (50, xyz, 456) will match rules 1 and 4.
Question: Is there a better way to do this? With only 3 fields this is no problem. But the actual table will have 15 columns and I worry about how well that SQL scales.
Speculation: An alternative SQL statement I came up with involved adding an extra column to the table with a count of how many fields are not null. (So in the example, this columns value for rules 1-4 is 1, 2, 2 and 2 respectively.) With this ‘col_count’ column, the select could be:
SELECT * FROM RULES WHERE (CASE WHEN A = :a THEN 1 ELSE 0 END) + (CASE WHEN B = :b THEN 1 ELSE 0 END) + (CASE WHEN C = :c THEN 1 ELSE 0 END) = COL_COUNT
Unfortunately, I don’t have enough sample data to find our which of these approaches would perform better. Before I start creating random rules, I thought I’d ask here whether there was a better approach.
Note: Data mining techniques and column constraints are not feasible here. The data must be checked as it enters the system and so it can be flagged pass/fail immediately. And, the users control the addition or removal of rules so I can’t convert the rules into column constraints or other data definition statements.
One last thing, in the end I need a list of all the rules that the data fails to pass. The solution cannot abort at the first failure.
Thanks.
The first query you provided is perfect. I really doubt that adding the column you were speaking of would give you any more speed, since the NOT NULL property of every entry is checked anyway, since every comparison to NULL yields false. So I would guess that
x=yis expanded tox IS NOT NULL AND x=yinternally. Maybe someone else can clarify that.All other optimizations I can think of would involve precalculation or caching. You can create [temporary] tables matching certain rules or add further columns holding matching rules.