In my project I came across a challenge with below T-SQL code.
- step1 populates the UserModules table with parent modules and its subscribed users
- step2 checks for child modules associated to modules in step1 in Modules_Hierarchy table and inserts valid records into UserModules tables by mapping child modules with parent modules subscribed users.
This step would repeats recursively until all child modules found.
Problem:
In step2, WHILE loop and SELECT statement uses correlated subquery and also the table UserModules is part of both INSERT and associated SELECT Clause which is hampering the performance and frequently the query failing with below LOCK escalation issue.
The final data size in ModulesUsers table is 42 million and its expected to grow.
Error Message: “The instance of the SQL Server Database Engine cannot obtain a LOCK resource at this time. Rerun your statement when there are fewer active users. Ask the database administrator to check the lock and memory configuration for this instance, or to check for long-running transactions.”
How to optimize this query i.e. step2 to resolve the issue?
Step1:
INSERT INTO UserModules(ModuleID, UserID)
SELECT ModuleID, UserID
FROM TABLEA a
INNER JOIN TABLEB b ON a.ID = b.ID
Step2:
DECLARE @cnt int
SET @cnt = 1
WHILE( @cnt > 0 )
BEGIN
SET @cnt = (SELECT COUNT(DISTINCT s.moduleid)
FROM Modules_Hirarchy s WITH (nolock), Modules t
WHERE s.ParentModuleId = t.ModuleId
------------
AND NOT EXISTS
(SELECT ModuleId + EndUserId
FROM UserModules r
WHERE s.moduleid = r.moduleid
AND t.EndUserId = r.EndUserId)
AND s.moduleid + t.EndUserId NOT IN
(SELECT CAST(ModuleId AS varchar) + EndUserId
FROM UserModules ))
IF @cnt = 0
BREAK
INSERT INTO UserModules (ModuleId, EndUserId)
SELECT DISTINCT s.moduleid, t.EndUserId
FROM Modules_Hirarchy s WITH (nolock), UserModules t
WHERE s.ParentModuleId = t.ModuleId
AND NOT EXISTS
(SELECT ModuleId + EndUserId
FROM UserModules r
WHERE s.moduleid = r.moduleid
AND t.EndUserId = r.EndUserId)
END
some data to play with
resolution
edit
its hard to say : ) to many variables
but things you should do to make query efficient
separate non clustered indexes on columns
ModuleID ParentModuleID ChildModuleIDyou probably dont want to query for all of the groups but only for a
explicit ones filter out as many groups as posible in anchor
statement
select a.ModuleID, a.UserID , CAST(null as int)as parentModule
from #UserModules a join #Modules_Hirarchy b on a.ModuleID = b.ChildModuleID
where b.ParentModuleID is null and a.ModuleId in (listOfModules)
add unique index for columns
(ParentModuleID, ChildModuleID)as non unique rows there may lead to huge amount of row duplicationExcept on that it depends on data selectivity on the ParentModuleID ChildModuleID, but you cant do much about it
i think it will work fine for big data sets as predicates are simple and as long as data selectivity is high