Can someone please explain the discrepancy among the following queries:
Query 1 (returns 87 results):
SELECT userId, COUNT(userId) as usercount
FROM `cpnc_PaymentOrder`
WHERE created >= UNIX_TIMESTAMP('2011-03-01')
GROUP BY userId
HAVING usercount > 1
Query 2 (returns 177 results):
SELECT userId, COUNT(userId) as usercount
FROM `cpnc_PaymentOrder`
WHERE created >= UNIX_TIMESTAMP('2011-02-01')
GROUP BY userId
HAVING usercount > 1
Query 3 (returns 55 results):
SELECT userId, COUNT(userId) as usercount
FROM `cpnc_PaymentOrder`
WHERE created >= UNIX_TIMESTAMP('2011-02-01')
AND created < UNIX_TIMESTAMP('2011-03-01')
GROUP BY userId
HAVING usercount > 1
Now I would think that the number of results from Query 2 minus the number of results from Query 1 would equal the number of results from Query 3. But this is not the case. Can someone please explain why?
Thanks, Jonah
EDIT:
for clarification, the query i want to write is:
SELECT userId
FROM `cpnc_PaymentOrder`
WHERE created >= UNIX_TIMESTAMP('2011-02-01')
AND created < UNIX_TIMESTAMP('2011-03-01')
AND userId "appears in at least one other record from before '2011-03-01'"
Because you’re comparing number of groups.
Let’s see this on a small example:
First query for this dataset will return 2 rows, second will return also 2 rows, 3rd will return also 2 rows.
Just remove your
GROUP BYand see the difference (withoutGROUP BYthe math will match, of course)