I need help optimizing or rewriting this complex query. The query worked great with the test data that had 50 rows but the real data now has over 700,000 rows and the query takes over five minutes to run. I have indexes on the primary keys of the two tables. I believe the age function is a lot of the cost as if I take it out it saves about 2 ½ minutes. Any suggestions are appreciated. Thanks In advance.
WITH T AS (
SELECT TOP 2000
A.Residence_City,
CASE
WHEN A.Gender = 'M' then 'Male'
WHEN A.Gender = 'F' then 'Female'
WHEN A.Gender = 'U' then 'Unknown'
END AS Gender,
count(case when voterData.dbo.ufn_GetAge ( convert(datetime,[Date_of_Birth]), GETDATE() ) between 18 and 24 and voterData.dbo.GeneralVoting (V.[G10EC], V.[G08EC],V.[G06EC]) = 0 then 1 else null end) as [0_3_18],
count(case when voterData.dbo.ufn_GetAge ( convert(datetime,[Date_of_Birth]), GETDATE() ) between 18 and 24 and voterData.dbo.GeneralVoting (V.[G10EC], V.[G08EC],V.[G06EC]) = 1 then 1 else null end) as [1_3_18],
count(case when voterData.dbo.ufn_GetAge ( convert(datetime,[Date_of_Birth]), GETDATE() ) between 18 and 24 and voterData.dbo.GeneralVoting (V.[G10EC], V.[G08EC],V.[G06EC]) = 2 then 1 else null end) as [2_3_18],
count(case when voterData.dbo.ufn_GetAge ( convert(datetime,[Date_of_Birth]), GETDATE() ) between 18 and 24 and voterData.dbo.GeneralVoting (V.[G10EC], V.[G08EC],V.[G06EC]) = 3 then 1 else null end) as [3_3_18],
count(case when voterData.dbo.ufn_GetAge ( convert(datetime,[Date_of_Birth]), GETDATE() ) between 18 and 24 and voterData.dbo.GeneralVoting (V.[G10EC], V.[G08EC],V.[G06EC]) = 0 then 1 else null end) as [Unknown_18],
count(case when voterData.dbo.ufn_GetAge ( convert(datetime,[Date_of_Birth]), GETDATE() ) between 25 and 34 and voterData.dbo.GeneralVoting (V.[G10EC], V.[G08EC],V.[G06EC]) = 0 then 1 else null end) as [0_3_25],
count(case when voterData.dbo.ufn_GetAge ( convert(datetime,[Date_of_Birth]), GETDATE() ) between 25 and 34 and voterData.dbo.GeneralVoting (V.[G10EC], V.[G08EC],V.[G06EC]) = 1 then 1 else null end) as [1_3_25],
count(case when voterData.dbo.ufn_GetAge ( convert(datetime,[Date_of_Birth]), GETDATE() ) between 25 and 34 and voterData.dbo.GeneralVoting (V.[G10EC], V.[G08EC],V.[G06EC]) = 2 then 1 else null end) as [2_3_25],
count(case when voterData.dbo.ufn_GetAge ( convert(datetime,[Date_of_Birth]), GETDATE() ) between 25 and 34 and voterData.dbo.GeneralVoting (V.[G10EC], V.[G08EC],V.[G06EC]) = 3 then 1 else null end) as [3_3_25],
count(case when voterData.dbo.ufn_GetAge ( convert(datetime,[Date_of_Birth]), GETDATE() ) between 25 and 34 and voterData.dbo.GeneralVoting (V.[G10EC], V.[G08EC],V.[G06EC]) = 0 then 1 else null end) as [Unknown_25],
count(case when voterData.dbo.ufn_GetAge ( convert(datetime,[Date_of_Birth]), GETDATE() ) between 35 and 49 and voterData.dbo.GeneralVoting (V.[G10EC], V.[G08EC],V.[G06EC]) = 0 then 1 else null end) as [0_3_35],
count(case when voterData.dbo.ufn_GetAge ( convert(datetime,[Date_of_Birth]), GETDATE() ) between 35 and 49 and voterData.dbo.GeneralVoting (V.[G10EC], V.[G08EC],V.[G06EC]) = 1 then 1 else null end) as [1_3_35],
count(case when voterData.dbo.ufn_GetAge ( convert(datetime,[Date_of_Birth]), GETDATE() ) between 35 and 49 and voterData.dbo.GeneralVoting (V.[G10EC], V.[G08EC],V.[G06EC]) = 2 then 1 else null end) as [2_3_35],
count(case when voterData.dbo.ufn_GetAge ( convert(datetime,[Date_of_Birth]), GETDATE() ) between 35 and 49 and voterData.dbo.GeneralVoting (V.[G10EC], V.[G08EC],V.[G06EC]) = 3 then 1 else null end) as [3_3_35],
count(case when voterData.dbo.ufn_GetAge ( convert(datetime,[Date_of_Birth]), GETDATE() ) between 35 and 49 and voterData.dbo.GeneralVoting (V.[G10EC], V.[G08EC],V.[G06EC]) = 0 then 1 else null end) as [Unknown_35],
count(case when voterData.dbo.ufn_GetAge ( convert(datetime,[Date_of_Birth]), GETDATE() ) between 50 and 64 and voterData.dbo.GeneralVoting (V.[G10EC], V.[G08EC],V.[G06EC]) = 0 then 1 else null end) as [0_3_50],
count(case when voterData.dbo.ufn_GetAge ( convert(datetime,[Date_of_Birth]), GETDATE() ) between 50 and 64 and voterData.dbo.GeneralVoting (V.[G10EC], V.[G08EC],V.[G06EC]) = 1 then 1 else null end) as [1_3_50],
count(case when voterData.dbo.ufn_GetAge ( convert(datetime,[Date_of_Birth]), GETDATE() ) between 50 and 64 and voterData.dbo.GeneralVoting (V.[G10EC], V.[G08EC],V.[G06EC]) = 2 then 1 else null end) as [2_3_50],
count(case when voterData.dbo.ufn_GetAge ( convert(datetime,[Date_of_Birth]), GETDATE() ) between 50 and 64 and voterData.dbo.GeneralVoting (V.[G10EC], V.[G08EC],V.[G06EC]) = 3 then 1 else null end) as [3_3_50],
count(case when voterData.dbo.ufn_GetAge ( convert(datetime,[Date_of_Birth]), GETDATE() ) between 50 and 64 and voterData.dbo.GeneralVoting (V.[G10EC], V.[G08EC],V.[G06EC]) = 0 then 1 else null end) as [Unknown_50],
count(case when voterData.dbo.ufn_GetAge ( convert(datetime,[Date_of_Birth]), GETDATE() ) between 65 and 120 and voterData.dbo.GeneralVoting (V.[G10EC], V.[G08EC],V.[G06EC]) = 0 then 1 else null end) as [0_3_65],
count(case when voterData.dbo.ufn_GetAge ( convert(datetime,[Date_of_Birth]), GETDATE() ) between 65 and 120 and voterData.dbo.GeneralVoting (V.[G10EC], V.[G08EC],V.[G06EC]) = 1 then 1 else null end) as [1_3_65],
count(case when voterData.dbo.ufn_GetAge ( convert(datetime,[Date_of_Birth]), GETDATE() ) between 65 and 120 and voterData.dbo.GeneralVoting (V.[G10EC], V.[G08EC],V.[G06EC]) = 2 then 1 else null end) as [2_3_65],
count(case when voterData.dbo.ufn_GetAge ( convert(datetime,[Date_of_Birth]), GETDATE() ) between 65 and 120 and voterData.dbo.GeneralVoting (V.[G10EC], V.[G08EC],V.[G06EC]) = 3 then 1 else null end) as [3_3_65],
count(case when voterData.dbo.ufn_GetAge ( convert(datetime,[Date_of_Birth]), GETDATE() ) between 65 and 120 and voterData.dbo.GeneralVoting (V.[G10EC], V.[G08EC],V.[G06EC]) = 0 then 1 else null end) as [Unknown_65],
count(case when voterData.dbo.ufn_GetAge ( convert(datetime,[Date_of_Birth]), GETDATE() ) = '' or voterData.dbo.ufn_GetAge ( convert(datetime,[Date_of_Birth]), GETDATE() )is null and voterData.dbo.GeneralVoting (V.[G10EC], V.[G08EC],V.[G06EC]) = 0 then 1 else null end) as [0_3_],
count(case when voterData.dbo.ufn_GetAge ( convert(datetime,[Date_of_Birth]), GETDATE() ) = '' or voterData.dbo.ufn_GetAge ( convert(datetime,[Date_of_Birth]), GETDATE() )is null and voterData.dbo.GeneralVoting (V.[G10EC], V.[G08EC],V.[G06EC]) = 1 then 1 else null end) as [1_3_],
count(case when voterData.dbo.ufn_GetAge ( convert(datetime,[Date_of_Birth]), GETDATE() ) = '' or voterData.dbo.ufn_GetAge ( convert(datetime,[Date_of_Birth]), GETDATE() )is null and voterData.dbo.GeneralVoting (V.[G10EC], V.[G08EC],V.[G06EC]) = 2 then 1 else null end) as [2_3_],
count(case when voterData.dbo.ufn_GetAge ( convert(datetime,[Date_of_Birth]), GETDATE() ) = '' or voterData.dbo.ufn_GetAge ( convert(datetime,[Date_of_Birth]), GETDATE() )is null and voterData.dbo.GeneralVoting (V.[G10EC], V.[G08EC],V.[G06EC]) = 3 then 1 else null end) as [3_3_],
count(case when voterData.dbo.ufn_GetAge ( convert(datetime,[Date_of_Birth]), GETDATE() ) = '' or voterData.dbo.ufn_GetAge ( convert(datetime,[Date_of_Birth]), GETDATE() )is null and voterData.dbo.GeneralVoting (V.[G10EC], V.[G08EC],V.[G06EC]) = 0 then 1 else null end) as [Unknown_]
from Personal as A INNER JOIN Voting_History V
on A.Vuid = V.Vuid
group by Residence_City, Gender with rollup
Order by Residence_City, Gender
)
, temp1 as (
SELECT *,([3_3_18] + [3_3_25] + [3_3_35] + [3_3_50] + [3_3_65] + [3_3_]
+[2_3_18] + [2_3_25] + [2_3_35] + [2_3_50] + [2_3_65] + [2_3_]
+[1_3_18] + [1_3_25] + [1_3_35] + [1_3_50] + [1_3_65] + [1_3_]
+[0_3_18] + [0_3_25] + [0_3_35] + [0_3_50] + [0_3_65] + [0_3_]
+[Unknown_18]+ [Unknown_25]+ [Unknown_35] + [Unknown_50]+ [Unknown_65] + [Unknown_]) Total
FROM T
where NULLIF(Residence_City,'') IS NOT NULL
), temp2 as (
SELECT * FROM temp1
UNION ALL
select
'grand Total',
SUM([0_3_18])/2,SUM([1_3_18])/2,SUM([2_3_18])/2,SUM([3_3_18])/2,SUM([Unknown_18])/2,
SUM([0_3_25])/2,SUM([1_3_25])/2,SUM([2_3_25])/2,SUM([3_3_25])/2,SUM([Unknown_25])/2,
SUM([0_3_35])/2,SUM([1_3_35])/2,SUM([2_3_35])/2,SUM([3_3_35])/2,SUM([Unknown_35])/2,
SUM([0_3_50])/2,SUM([1_3_50])/2,SUM([2_3_50])/2,SUM([3_3_50])/2,SUM([Unknown_50])/2,
SUM([0_3_65])/2,SUM([1_3_65])/2,SUM([2_3_65])/2,SUM([3_3_65])/2,SUM([Unknown_65])/2,
SUM([0_3_])/2,SUM([1_3_])/2,SUM([2_3_])/2,SUM([3_3_])/2,SUM([Unknown_])/2,
sum(Total)/2
FROM temp1
)
SELECT Residence_City, Gender,
[0_3_18] as [0_3],
[1_3_18] as [1_3],
[2_3_18] as [2_3],
[3_3_18] as [3_3],
[Unknown_18] as [Unknown],
[0_3_25] as [0_3],
[1_3_25] as [1_3],
[2_3_25] as [2_3],
[3_3_25] as [3_3],
[Unknown_25] as [Unknown],
[0_3_35] as [0_3],
[1_3_35] as [1_3],
[2_3_35] as [2_3],
[3_3_35] as [3_3],
[Unknown_35] as [Unknown],
[0_3_35] as [0_3],
[1_3_35] as [1_3],
[2_3_35] as [2_3],
[3_3_35] as [3_3],
[Unknown_35] as [Unknown],
[0_3_35] as [0_3],
[1_3_35] as [1_3],
[2_3_35] as [2_3],
[3_3_35] as [3_3],
[Unknown_35] as [Unknown],
[0_3_35] as [0_3],
[1_3_35] as [1_3],
[2_3_35] as [2_3],
[3_3_35] as [3_3],
[Unknown_35] as [Unknown],
Total
FROM temp2
These are the functions
ALTER FUNCTION [dbo].[GeneralVoting] ( @one varchar, @two varchar,@three varchar)
RETURNS INT
AS
BEGIN
DECLARE @vAge INT
SET @vAge = (CASE WHEN @one IS NOT NULL THEN 1 ELSE 0 END)
+(CASE WHEN @two IS NOT NULL THEN 1 ELSE 0 END)
+(CASE WHEN @three IS NOT NULL THEN 1 ELSE 0 END)
RETURN @vAge
END
And
ALTER FUNCTION [dbo].[ufn_GetAge] ( @pDateOfBirth DATETIME, @pAsOfDate DATETIME )
RETURNS INT
AS
BEGIN
DECLARE @vAge INT
IF @pDateOfBirth >= @pAsOfDate
RETURN 0
SET @vAge = DATEDIFF(YY, @pDateOfBirth, @pAsOfDate)
IF MONTH(@pDateOfBirth) > MONTH(@pAsOfDate) OR
(MONTH(@pDateOfBirth) = MONTH(@pAsOfDate) AND
DAY(@pDateOfBirth) > DAY(@pAsOfDate))
SET @vAge = @vAge - 1
RETURN @vAge
END
The join structures seem reasonable. I’m not sure if you need the rollup, but it probably does not affect performance.
I suspect the user defined functions are affecting performance. SQL Server is not smart enough to optimize away the multiple calls to ufn_GetAge(), so the function gets called over and over. Instead, put the age in a subquery:
The function is pretty simple, so you could also replace it in-line to eliminate this one call.
You can take the same approach for the other function as well.
Also, how large are the Voting and Personal History tables? In particular, how many combinations of city and gender? You might be able to make the query more efficient by pre-selecting the top 2000 and then doing the calculations for the different groups. Without knowing the sizes, though, it is hard to tell whether this works. Also, do you really mean to order by Residence_City, Gender . . . or do you want to order by one of the other fields? As is, the query just selects the first alphabetically.