I have a set of data from a table (TableA) which relates to itself through TableB. Parents in TableA have children in TableA. Those children might also have children. Nothing amazing here.
I have a top-level set of rows from TableA that I need to operate on. Before I can operate on those rows, I must have each child row on hand. I must be able to operate on each top-level row of TableA (and it’s children) as fast as possible in my application.
I can’t find a way to do this.
Using a recursive CTE (TableA top-level set as anchor, TableB->TableA join as union), does not fulfill the requirements. The entire top-level set from TableA is returned in the CTE before it works on level 2 of the children. Then it works on level 3. Then level 4, etc. Since my top-level set is some 400,000 plus rows, my client application cannot begin working on rows until the ENTIRE dataset has been batched up on the server.
I need a better way to do this. I’ve tried streaming a flat set of top-level TableA rows to the client, and having the client issue the recursive CTE statement repeatedly for each top-level TableA row. This actually works. But there’s too much noise. The sustained row retrieval rate is too large due to the repeated reissuing of statements.
I need a creative solution.
Snippet of the per-record CTE I’m using. In this example, TableA is Member, and TableB is MemberReplacement. I ripped out most of the select statement in the middle, and most of the joins.
WITH T_MemberRecurse
(
MemberId,
IncludedMemberId,
Level
) AS (
SELECT Member.Id,
Member.Id,
0
FROM MemberInput
INNER JOIN MemberInputItem
ON MemberInputItem.MemberInputId = MemberInput.Id
INNER JOIN Member
ON Member.Id = MemberInputItem.MemberId
UNION ALL
SELECT T_MemberRecurse.MemberId,
Member2.Id,
Level + 1
FROM T_MemberRecurse
INNER JOIN Member
ON Member.Id = T_MemberRecurse.IncludedMemberId
INNER JOIN MemberReplacement
ON MemberReplacement.MemberId = Member.Id
INNER JOIN Member Member2
ON Member2.Id = MemberReplacement.OriginalMemberId
)
SELECT Member.Id,
T_MemberRecurse.IncludedMemberId,
T_MemberRecurse.Level,
FROM MemberInput
INNER JOIN LotsOfTables
I’m thinking about this a bit right now, but first a stab in the dark that could help, due to experiences I’ve had with linked servers where forcing row-by-row operations improved performance by 2 orders of magnitude.
Turn your CTE into a rowset-returning function with one parameter, the desired Member Id.
Then:
Please let me know if this works. The idea is to force the engine to traverse deep-first rather than wide-first. It might be lower overall server performance, but theoretically should let your client begin working with some rows of data.
Update
Second idea: get the parent and child information separately and perform, logically, a merge join in the client. (An ordered nested loop that only advances the ordered second/inner input until it mismatches.) Get smaller chunks at once using key ranges or row_number. Or get the entire parent set then get smaller set of child rows.
Update 2
Idea 3: Instead of a recursive CTE, use 5 plain vanilla joins to get all the data you need. It sounds awful, but should let you do FAST 100 to get started on the data.