I have witnessed a strange behaviour while trying to GROUP BY a VARCHAR field.
Let the following example, where I try to spot customers that have changed name at least once in the past.
CREATE TABLE #CustomersHistory
(
Id INT IDENTITY(1,1),
CustomerId INT,
Name VARCHAR(200)
)
INSERT INTO #CustomersHistory VALUES (12, 'AAA')
INSERT INTO #CustomersHistory VALUES (12, 'AAA')
INSERT INTO #CustomersHistory VALUES (12, 'BBB')
INSERT INTO #CustomersHistory VALUES (44, '444')
SELECT ch.CustomerId, count(ch.Name) AS cnt
FROM #CustomersHistory ch
GROUP BY ch.CustomerId HAVING count(ch.Name) != 1
Which oddly yields (as if ‘AAA’ from first INSERT was different from the second one)
CustomerId cnt // (I was expecting)
12 3 // 2
44 1 // 1
- Is this behaviour specific to T-SQL?
- Why does it behave in this rather counter-intuitive way?
- How is it customary to overcome this limitation?
Note: This question is very similar to GROUP BY problem with varchar, where I didn’t find the answer to Why
Side Note: Is it good practice to use HAVING count(ch.Name) != 1 instead of HAVING count(ch.Name) > 1 ?
The
COUNT()operator will count all rows regardless of value. I think you might want to use aCOUNT(DISTINCT ch.Name)which will only count unique names.For more information, take a look at the COUNT() article on book online