I am looking to optimise a concatenation across multiple rows, and having read some similar questions am familiar with using STUFF + XML path etc. However, when I apply these to my query it usually times out when applying to the 9 million rows or so rows I have
What I’m looking for is a more efficient way of translating this:
create table #fruit
(
Contact_id NVARCHAR(50)
,fruit_type NVARCHAR(50)
,[2005_orders] int
,[2006_orders] int
,[2007_orders] int
,[2008_orders] int
,[2009_orders] int
)
INSERT INTO #fruit VALUES ('id001','banana',1,3,0,25,4)
INSERT INTO #fruit VALUES ('id001','apple',0,7,19,1,0)
INSERT INTO #fruit VALUES ('id001','orange',0,0,0,9,0)
INSERT INTO #fruit VALUES ('id001','strawberry',1,1,1,1,4)
INSERT INTO #fruit VALUES ('id001','grapes',0,3,0,0,0)
INSERT INTO #fruit VALUES ('id001','lemon',1,1,1,0,0)
Into this:
CREATE TABLE #results
(
contact_id NVARCHAR(255)
,fruit_type NVARCHAR(50)
,[2005_orders] int
,[2006_orders] int
,[2007_orders] int
,[2008_orders] int
,[2009_orders] int
,combination2005 NVARCHAR(500)
,combination2006 NVARCHAR(500)
,combination2007 NVARCHAR(500)
,combination2008 NVARCHAR(500)
,combination2009 NVARCHAR(500)
)
INSERT INTO #results VALUES ('id001','banana',1,3,0,25,4,'banana + strawberry + lemon','banana + apple + strawberry + grapes + lemon','apple + strawberry + lemon','banana + apple + orange + strawberry','banana + strawberry')
INSERT INTO #results VALUES ('id001','apple',0,7,19,1,0,'banana + strawberry + lemon','banana + apple + strawberry + grapes + lemon','apple + strawberry + lemon','banana + apple + orange + strawberry','banana + strawberry')
INSERT INTO #results VALUES ('id001','orange',0,0,0,9,0,'banana + strawberry + lemon','banana + apple + strawberry + grapes + lemon','apple + strawberry + lemon','banana + apple + orange + strawberry','banana + strawberry')
INSERT INTO #results VALUES ('id001','strawberry',1,1,1,1,4,'banana + strawberry + lemon','banana + apple + strawberry + grapes + lemon','apple + strawberry + lemon','banana + apple + orange + strawberry','banana + strawberry')
INSERT INTO #results VALUES ('id001','grapes',0,3,0,0,0,'banana + strawberry + lemon','banana + apple + strawberry + grapes + lemon','apple + strawberry + lemon','banana + apple + orange + strawberry','banana + strawberry')
INSERT INTO #results VALUES ('id001','lemon',1,1,1,0,0,'banana + strawberry + lemon','banana + apple + strawberry + grapes + lemon','apple + strawberry + lemon','banana + apple + orange + strawberry','banana + strawberry')
Where the key things to factor in are that I want a row per fruit type per contact (as this table will be used elsewhere) and that I only want a fruit to make it into the combination type if the count is greater than 0.
It might be that this isn’t ever going to be very efficient given the number of rows I’m dealing with, but if there’s any chance I can append this information onto my table that would be great 🙂
Methods tried
Method 1)
SELECT *
,STUFF(
(SELECT ' ' + fruit_type
FROM #fruit fr2
WHERE fr.contact_id = fr2.contact_id
AND 2005_orders > 0
order by contact_id,fruit_type
FOR XML path ('')
)
,1,1,''
) AS combination
FROM #fruit fr
Method 2)
SELECT *
,ISNULL((MAX(CASE WHEN fruit_type = 'banana' AND 2005_orders > 0 THEN 'banana ' END) OVER (PARTITION BY contact_id)),'')+
ISNULL((MAX(CASE WHEN fruit_type = 'apple' AND 2005_orders > 0 THEN 'apple ' END) OVER (PARTITION BY contact_id)),'')+
ISNULL((MAX(CASE WHEN fruit_type = 'orange' AND 2005_orders > 0 THEN 'orange' END) OVER (PARTITION BY contact_id)),'')+
ISNULL((MAX(CASE WHEN fruit_type = 'strawberry' AND 2005_orders > 0 THEN 'strawberry ' END) OVER (PARTITION BY contact_id)),'')+
ISNULL((MAX(CASE WHEN fruit_type = 'grapes' AND 2005_orders > 0 THEN 'grapes ' END) OVER (PARTITION BY contact_id)),'')+
ISNULL((MAX(CASE WHEN fruit_type = 'lemon' AND 2005_orders > 0 THEN 'lemon ' END) OVER (PARTITION BY contact_id)),'')+
AS combination05
FROM #fruit fr
— which is then repeated for years 2006-2009 (which I know is hideously inefficient!)
The performance issue with both of your methods is going to be the subquery. Try this strategy to break it apart and avoid subqueries.
You don’t need to use outer joins if you are guaranteed to have records for each contact_id/fruit_type combo.
Index on contact_id should vastly improve performance.
and don’t use “SELECT *”, I’m just being lazy.
I should add that if you don’t expect every contact_id has a record for every fruit_type (thus you need to use outer joins here), then the case expressions should also test for null in addition to zero. (Added that above)