We have rather Big machine 100GB+ memory and 8+ cores in it. Server wide MAXDOP=8.
T_SEQ_FF rowcount = 61692209, size = 2991152 KB
UPD 1:
Table T_SEQ_FF have two indexes:
1) create index idx_1 on T_SEQ_FF (first_num)
2) create index idx_2 on T_SEQ_FF (second_num)
Table T_SEQ_FF have first_num, second_num pairs of nums that should provide a sequence after cte:
;with first_entity as (
select first_num from T_SEQ_FF a where not exists (select 1 from T_SEQ_FF b where a.first_num = b.second_num)
) ,
cte as (
select a.first_num, a.second_num, a.first_num as first_key, 1 as sequence_count
from T_SEQ_FF a inner join first_entity b on a.first_num = b.first_num
union all
select a.first_num, a.second_num, cte.first_key, cte.sequence_count + 1
from T_SEQ_FF a
inner join cte on a.first_num = cte.second_num
)
select *
from cte
option (maxrecursion 0);
But when I run this query – I only see serial query plan without Parallelism.
If I remove 2nd part of CTE from query above:
union all
select a.first_num, a.second_num, cte.first_key, cte.sequence_count + 1
from T_SEQ_FF a
inner join cte on a.first_num = cte.second_num
then I could see that query plan becomes Parallelized using Repartition and Gather Streams.
So I can summarize that it is because of recurisve CTE SQL Server is not using Parallelism when processing this query.
I believe that on such big machine with tons of free resources Parallelism should help to finish query faster.
For now it runs for ~40-50mins.
Could you advice how to use as much resources as we can to finish the query faster?
CTE is the only option because we need to populate sequences from first_num - second_num pairs and those sequences could be of any length.
I would try rewriting the CTE to remove one of the steps i.e.
If there is only one root element it would be better to pass this into the query as a variable so the value can be used by the query optimizer.
Another thing to try is change the query to get the root elements without a subquery i.e. second_num is null or first_num = second_num.