If you run the following sample code in SQL Server, you’ll notice that newid() materializes after the join whereas row_number() materializes before the join. Does anyone understand this and if there’s a way to work around it?
declare @a table ( num varchar(10) )
insert into @a values ('dan')
insert into @a values ('dan')
insert into @a values ('fran')
insert into @a values ('fran')
select *
from @a T
inner join
(select num, newid() id
from @a
group by num) T1 on T1.num = T.num
select *
from @a T
inner join
(select num, row_number() over (order by num) id
from @a
group by num) T1 on T1.num = T.num
Not sure I see what the problem is here. Materialize the subquery T1 first:
You get two rows:
Now join that against a on num = num, you get 4 rows, 2 for each distinct value. What is your actual goal here? Perhaps you should be applying ROW_NUMBER() outside?
The order of materialization is up to the optimizer. You’ll find that other built-ins (RAND(), GETDATE() etc.) have similarly inconsistent materialization behavior. Not much you can do about it, and not much chance they’re going to “fix” it.
EDIT
New code sample. Write the contents of @a to a #temp table to “materialize” the NEWID() assignment per unique num value.