I have a query that needs to get the first and second highest sku in each members wishlist. The below query works, but it takes way too long because there’s about 9 million users and each user has about 10 wishlist items, so you can see that the query below will never finish.
SELECT MAX(CASE WHEN wl.rank = 1 THEN wl.SKU ELSE NULL END) AS [highestSku],
MAX(CASE WHEN wl.rank = 2 THEN wl.SKU ELSE NULL END) AS [secondHighestSku],
FROM Member m
LEFT JOIN (SELECT *
FROM (SELECT DENSE_RANK() OVER (PARTITION BY wl.MemberID ORDER BY wli.Price DESC) AS rank, wl.MemberID, wli.SKU
FROM WishListItem wli
INNER JOIN WishList wl ON wli.WishListID = wl.ID) T1) w ON w.MemberID = m.ID
My question is, is there a better way to get the top first and second records for each user? If not, is there a way I can optimize this query? Ideally, if I can restirct the number of tiems pulled back from the ranking query (the one with the DENSE_RANK()) that will help me out. I wanted to do something like WHERE DENDS_RANK() <= 2, but that’s not possible, and doing it outside of the brackets defeats the purpose of the soultion.
Also, this is just part of the query. I actually have even more left joins across more tables that have just as many items, and I need to get the top 1 and 2 records for each user.
And this needs to be done in one query, or as much as possible in one because I’m throwing it in a data table. I can also reduce the number of records, ie. TOP 1000, and break up the query, but I will need to be able to continue from where I left off… also, I did try TOP 1000, and after 10 minutes, I cancelled the query because I need to get all 9 million records out.
I’d grab a relatively small subset of the data, stick it in a table variable, and run the query off that instead of the main (and likely very “busy”) tables:
Make sure that the WHERE clause defines a narrower subset of data than your production tables. If performance still suffers, you could create table variables for the other tables you’re joining, then use those in the final query.