So I was trying to explain to some people why this query is a bad idea:
SELECT z.ReportDate, z.Zipcode, SUM(z.Sales) AS Sales,
COALESCE(
(SELECT TOP (1) GroupName
FROM dbo.zipGroups
WHERE (Zipcode = z.Zipcode)), 'Unknown') AS GroupName,
COALESCE(
(SELECT TOP (1) GroupCode
FROM dbo.zipGroups
WHERE (Zipcode = z.Zipcode)), 0) AS GroupNumber
FROM dbo.Report_ByZipcode AS z
GROUP BY z.ReportDate, z.Zipcode
and suggesting a better way to write it, when my boss ended the discussion with, “Well, it’s been returning the right data for the last year and we haven’t had any problems with it, so it’s fine.”
At which point I thought to myself, how in the world is that even possible?
After some digging, I discovered these facts:
- This query is supposed to group sales by Zipcode and date, and link those to the largest Group (by population size) that a Zipcode is assigned to by way of the zipGroups table.
- Each Zipcode can be assigned to 0 to many Groups, and if a Zipcode is assigned to 0 Groups, it’s simply not in the zipGroups table.
- A Group is a geographical area, and the GroupNumbers are ranked by largest to smallest by population (for example, the group covering the NY-NJ-CT tri-state area is GroupNumber 1, and North Platte, Nebraska is GroupNumber 209).
- The zipGroups table has not changed in at least 2 years.
- The zipGroups table has a clustered index with Zipcode, GroupNumber (ascending) as the keys.
- The combination of Zipcode, GroupNumber is unique in zipGroups.
So my question has 2 parts.
A) Even though there are no ORDER BY clauses in those SELECT TOP queries, are they actually deterministic because the clustered index is basically providing it a default ORDER BY?
B1) If that is true, is the query, however precariously, actually doing what it’s supposed to do?
B2) If that is not true, can you help me prove it?
Note: I’ve already re-written this to use joins, so I don’t need the SQL to fix it, I need to get it into production so I stop worrying about it breaking.
SQL Server makes no guarantees about the ordering of records in the absence of ORDER BY. It might yield the correct results 999,999 times and then fail on the millionth try. Don’t do it.