The intention of following (simplified) code fragment is to return one random row.
Unfortunatly, when we run this fragment in the query analyzer, it returns between zero and three results.
As our input table consists of exactly 5 rows with unique ID’s and as we perform a select on this table where ID equals a random number, we are stumped that there would ever be more than one row returned.
Note: among other things, we already tried casting the checksum result to an integer with no avail.
DECLARE @Table TABLE (
ID INTEGER IDENTITY (1, 1)
, FK1 INTEGER
)
INSERT INTO @Table
SELECT 1
UNION ALL SELECT 2
UNION ALL SELECT 3
UNION ALL SELECT 4
UNION ALL SELECT 5
SELECT *
FROM @Table
WHERE ID = ABS(CHECKSUM(NEWID())) % 5 + 1
Edit
Our usage scenario is as follows (please don’t comment on wether it is the right thing to do or not. It’s the powers that be that have decided)
Ultimately, we must create a result with realistic values where the combination of producer and weights are obfuscated by selecting at random existing weights from the table itself.
The query then would become something like this (also a reason why RAND can not be used)
SELECT t.ID
, FK1 = (SELECT FK1 FROM @Table WHERE ID=ABS(CHECKSUM(NEWID())) % 5 + 1)
FROM @Table t
Because the inner select could be returning zero results, it would return a NULL value wich again is not acceptable. It is the investigation of why the inner select returns between zero and x results, that this question sproused (is this even English?).
Answer
What turned the light on for me was the simple observation that ABS(CHECKSUM(NEWID())) % 5 + 1) was re-evaluated for each row. I was under the impression that ABS(CHECKSUM(NEWID())) % 5 + 1) would get evaluated once, then matched.
Thank you all for answering and slowly but surely leading me to a better understanding.
The reason this happens is because NEWID() gies a different value for each row in the table. For each row, independently of the others, there is a one in five chance of it being returned. Consequently, as it stands, you actually have a 1 in 3125 chance of all 5 rows being returned!
To see this, run the following query. You’ll see that each row gets a different ID.
This will fix your code:
However, I’m not sure this is the most efficient method of selecting a single random row from the table.
You might find this MSDN article useful: http://msdn.microsoft.com/en-us/library/Aa175776 (Random Sampling in T-SQL)
EDIT 1: now I think about it, this probably is the most efficient way to do it, assuming the number of rows remains fixed and the IDs are guaranteed to be contiguous.
EDIT 2: to achieve the desired result when used as a sub-query, use TOP 1 like this: