Preemptive apologies for the nonsensical table/column names on these queries. If you’ve ever worked with the DB backend of Remedy, you’ll understand.
I’m having a problem where a Count Distinct is returning a null value, when I suspect the actual value should be somewhere in the 20’s (23, I believe). Below is a series of queries and their return values.
SELECT count(distinct t442.c1)
FROM t442, t658, t631
WHERE t442.c1 = t658.c536870930
AND t442.c200000003 = 'Network'
AND t442.c536871139 < 2
AND t631.c536870913 = t442.c1
AND t658.c536870925 = 1
AND (t442.c7 = 6 OR t442.c7 = 5)
AND t442.c536870954 > 1141300800
AND (t442.c240000010 = 0)
Result = 497.
Add table t649 and make sure it has records linked back to table t442:
SELECT COUNT (DISTINCT t442.c1)
FROM t442, t658, t631, t649
WHERE t442.c1 = t658.c536870930
AND t442.c200000003 = 'Network'
AND t442.c536871139 < 2
AND t631.c536870913 = t442.c1
AND t658.c536870925 = 1
AND (t442.c7 = 6 OR t442.c7 = 5)
AND t442.c536870954 > 1141300800
AND (t442.c240000010 = 0)
AND t442.c1 = t649.c536870914
Result = 263.
Filter out records in table t649 where column c536870939 <= 1:
SELECT COUNT (DISTINCT t442.c1)
FROM t442, t658, t631, t649
WHERE t442.c1 = t658.c536870930
AND t442.c200000003 = 'Network'
AND t442.c536871139 < 2
AND t631.c536870913 = t442.c1
AND t658.c536870925 = 1
AND (t442.c7 = 6 OR t442.c7 = 5)
AND t442.c536870954 > 1141300800
AND (t442.c240000010 = 0)
AND t442.c1 = t649.c536870914
AND t649.c536870939 > 1
Result = 24.
Filter on the HAVING statement:
SELECT COUNT (DISTINCT t442.c1)
FROM t442, t658, t631, t649
WHERE t442.c1 = t658.c536870930
AND t442.c200000003 = 'Network'
AND t442.c536871139 < 2
AND t631.c536870913 = t442.c1
AND t658.c536870925 = 1
AND (t442.c7 = 6 OR t442.c7 = 5)
AND t442.c536870954 > 1141300800
AND (t442.c240000010 = 0)
AND t442.c1 = t649.c536870914
AND t649.c536870939 > 1
HAVING COUNT (DISTINCT t631.c536870922) =
COUNT (DISTINCT t649.c536870931)
Result = null.
If I run the following query, I can’t see anything in the result list that would explain why I’m not getting any kind of return value. This is true even if I remove the DISTINCT from the SELECT. (I get 25 and 4265 rows of data back, respectively).
SELECT DISTINCT t442.c1, t631.c536870922, t649.c536870931
FROM t442, t658, t631, t649
WHERE t442.c1 = t658.c536870930
AND t442.c200000003 = 'Network'
AND t442.c536871139 < 2
AND t631.c536870913 = t442.c1
AND t658.c536870925 = 1
AND (t442.c7 = 6 OR t442.c7 = 5)
AND t442.c536870954 > 1141300800
AND (t442.c240000010 = 0)
AND t442.c1 = t649.c536870914
AND t649.c536870939 > 1
I have several other places where I have the query set up exactly like the one that is returning the null value and it work perfectly fine–returning usable numbers that are the correct values. I have to assume that whatever is unique in this situation is related to data and not the actual query, but I’m not sure what to look for in the data to explain it. I haven’t been able to find any null values in the raw data before aggregation. I don’t know what else would cause this.
Any help would be appreciated.
I understand now. Your problem in the original query is that it is highly unusual (if not, in fact, wrong) to use a HAVING clause without a GROUP BY clause. The answer lies in the order of operation the various parts of the query are performed.
In the original query, you do this:
The database will perform your joins and constraints, at which point it would do any group by and aggregation operations. In this case, you are not grouping, so the COUNT operations are across the whole data set. Based on the values you posted above, COUNT(DISTINCT t631.c536870922) = 25 and COUNT(DISTINCT t649.c536870931) = 24. The HAVING clause now gets applied, resulting in nothing matching – your asking for cases where the count of the total set (even though there are multiple c1s) are equal, and they are not. The DISTINCT gets applied to an empty result set, and you get nothing.
What you really want to do is just a version of what you posted in the example that spit out the rows counts:
This will give you a list of the c1 columns that have equal numbers of the 631 & 649 table entries. Note: You should be very careful about the use of DISTINCT in your queries. For example, in the case where you posted the results above, it is completely unnecessary; oftentimes it acts as a kind of wallpaper to cover over errors in queries that don’t return results the way you want due to a missed constraint in the WHERE clause (“Hmm, my query is returning dupes for all these values. Well, a DISTINCT will fix that problem”).