I came across this CTE solution for concatenating row elements and I thought it’s brilliant and I realized how powerful CTEs can be.
However, in order to use such a tool effectively I need to know how it works internally to build that mental image which is essential for beginners, like me, to use it in different scenarios.
So I tried to slow motion the process of the above snippet and here is the code
USE [NORTHWIND]
GO
/****** Object: Table [dbo].[Products2] Script Date: 10/18/2011 08:55:07 ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
IF OBJECT_ID('Products2','U') IS NOT NULL DROP TABLE [Products2]
CREATE TABLE [dbo].[Products2](
[ProductID] [int] IDENTITY(1,1) NOT NULL,
[ProductName] [nvarchar](40) NOT NULL,
[SupplierID] [int] NULL,
[CategoryID] [int] NULL,
[QuantityPerUnit] [nvarchar](20) NULL,
[UnitPrice] [money] NULL,
[UnitsInStock] [smallint] NULL,
[UnitsOnOrder] [smallint] NULL,
[ReorderLevel] [smallint] NULL,
[Discontinued] [bit] NOT NULL
) ON [PRIMARY]
GO
SET IDENTITY_INSERT [dbo].[Products2] ON
INSERT [dbo].[Products2] ([ProductID], [ProductName], [SupplierID], [CategoryID], [QuantityPerUnit], [UnitPrice], [UnitsInStock], [UnitsOnOrder], [ReorderLevel], [Discontinued]) VALUES (1, N'vcbcbvcbvc', 1, 4, N'10 boxes x 20 bags', 18.0000, 39, 0, 10, 0)
INSERT [dbo].[Products2] ([ProductID], [ProductName], [SupplierID], [CategoryID], [QuantityPerUnit], [UnitPrice], [UnitsInStock], [UnitsOnOrder], [ReorderLevel], [Discontinued]) VALUES (2, N'Changassad', 1, 1, N'24 - 12 oz bottles', 19.0000, 17, 40, 25, 0)
INSERT [dbo].[Products2] ([ProductID], [ProductName], [SupplierID], [CategoryID], [QuantityPerUnit], [UnitPrice], [UnitsInStock], [UnitsOnOrder], [ReorderLevel], [Discontinued]) VALUES (3, N'Aniseed Syrup', 1, 2, N'12 - 550 ml bottles', 10.0000, 13, 70, 25, 0)
INSERT [dbo].[Products2] ([ProductID], [ProductName], [SupplierID], [CategoryID], [QuantityPerUnit], [UnitPrice], [UnitsInStock], [UnitsOnOrder], [ReorderLevel], [Discontinued]) VALUES (4, N'Chef Anton''s Cajun Seasoning', 2, 2, N'48 - 6 oz jars', 22.0000, 53, 0, 0, 0)
INSERT [dbo].[Products2] ([ProductID], [ProductName], [SupplierID], [CategoryID], [QuantityPerUnit], [UnitPrice], [UnitsInStock], [UnitsOnOrder], [ReorderLevel], [Discontinued]) VALUES (5, N'Chef Anton''s Gumbo Mix', 10, 2, N'36 boxes', 21.3500, 0, 0, 0, 1)
SET IDENTITY_INSERT [dbo].[Products2] OFF
GO
IF OBJECT_ID('DELAY_EXEC','FN') IS NOT NULL DROP FUNCTION DELAY_EXEC
GO
CREATE FUNCTION DELAY_EXEC() RETURNS DATETIME
AS
BEGIN
DECLARE @I INT=0
WHILE @I<99999
BEGIN
SELECT @I+=1
END
RETURN GETDATE()
END
GO
WITH CTE (EXEC_TIME, CategoryID, product_list, product_name, length)
AS (SELECT dbo.DELAY_EXEC(),
CategoryID,
CAST('' AS VARCHAR(8000)),
CAST('' AS VARCHAR(8000)),
0
FROM Northwind..Products2
GROUP BY CategoryID
UNION ALL
SELECT dbo.DELAY_EXEC(),
p.CategoryID,
CAST(product_list + CASE
WHEN length = 0 THEN ''
ELSE ', '
END + ProductName AS VARCHAR(8000)),
CAST(ProductName AS VARCHAR(8000)),
length + 1
FROM CTE c
INNER JOIN Northwind..Products2 p
ON c.CategoryID = p.CategoryID
WHERE p.ProductName > c.product_name)
SELECT *
FROM CTE
ORDER BY EXEC_TIME
--SELECT CategoryId, product_list
-- FROM ( SELECT CategoryId, product_list,
-- RANK() OVER ( PARTITION BY CategoryId ORDER BY length DESC )
-- FROM CTE ) D ( CategoryId, product_list, rank )
-- WHERE rank = 1 ;
The commented block is the desired output for the concatenation problem but it’s not the question here.
I’ve added a column EXEC_TIME to know which row got added first.
The output doesn’t look right to me for two reasons
-
I thought there would be a redundant data because of the condition
p.ProductName > c.product_namein another word the first part of the CTE the empty rows are always less then values in the Product2 table so each time it runs it should bring a new set of already added rows once again. Does this make any sense? -
The hierarchy of data is really weird the last item should be the longest and look what is the last item? An item with
length=1?
Any expert to the rescue? Thanks in advance.
Sample Results
EXEC_TIME CategoryID product_list product_name length
----------------------- ----------- ------------------------------------------------------------------- --------------------------------- -----------
2011-10-18 12:46:14.930 1 0
2011-10-18 12:46:14.990 2 0
2011-10-18 12:46:15.050 4 0
2011-10-18 12:46:15.107 4 vcbcbvcbvc vcbcbvcbvc 1
2011-10-18 12:46:15.167 2 Aniseed Syrup Aniseed Syrup 1
2011-10-18 12:46:15.223 2 Chef Anton's Cajun Seasoning Chef Anton's Cajun Seasoning 1
2011-10-18 12:46:15.280 2 Chef Anton's Gumbo Mix Chef Anton's Gumbo Mix 1
2011-10-18 12:46:15.340 2 Chef Anton's Cajun Seasoning, Chef Anton's Gumbo Mix Chef Anton's Gumbo Mix 2
2011-10-18 12:46:15.400 2 Aniseed Syrup, Chef Anton's Cajun Seasoning Chef Anton's Cajun Seasoning 2
2011-10-18 12:46:15.463 2 Aniseed Syrup, Chef Anton's Gumbo Mix Chef Anton's Gumbo Mix 2
2011-10-18 12:46:15.520 2 Aniseed Syrup, Chef Anton's Cajun Seasoning, Chef Anton's Gumbo Mi Chef Anton's Gumbo Mix 3
2011-10-18 12:46:15.580 1 Changassad Changassad 1
The page Recursive Queries Using Common Table Expressions describes the logic of CTEs:
However, that’s only the logical flow. As always, with SQL, the server is free to reorder operations as it sees fit, if the result will be “the same”, and the reordering is perceived to provide the results more efficiently.
The presence of your function with side effects (causing a delay, then returning
GETDATE()) isn’t something that would normally be considered when deciding whether to reorder operations.One obvious way in which the query may be reordered is that it may decide to start working on result set
Ti+1before it has fully created result setTi– it may be more efficient to do this than to fully constructTifirst, since the new rows are definitely already in memory and have been accessed recently.