I need to perform this Oracle query in SQL Server:
select case_id, channel_index,
min(su_min) as sustained_min,
max(su_max) as sustained_max
from (
select case_id, channel_index, start_time,
min(dms_value) over (partition by case_id, channel_index order by start_time
range numtodsinterval(3, 'minute') preceeding) as su_max,
max(dms_value) over (partition by case_id, channel_index order by start_time
range numtodsinterval(3, 'minute') preceeding) as su_min,
min(start_time) over (partition by case_id, channel_index order by start_time)
as first_time
from data_table order by start_time
) as su_data
where
first_time + numtodsinterval(3, 'minute') <= start_time
group by
case_id, channel_index
Here is what I attempted in basic T-SQL which does the job but when a case has 1 million+ records it takes more that 37 mins (after which I cancelled the query):
ALTER procedure [dbo].[GetSustainedValues](
@case_id int,
@time_limit int,
@bypass_only bit = NULL)
as
begin
DECLARE @time DateTime, @channelindex int, @lastchannelindex int
DECLARE @tmin float, @tmax float, @min float, @max float, @caseid int
DECLARE @results TABLE(case_id int, channel_index int, max float null, min float null)
DECLARE CursorName CURSOR FAST_FORWARD
FOR SELECT start_time, channel_index from continuous_data where case_id = @case_id order by channel_index, start_time
OPEN CursorName
FETCH NEXT FROM CursorName INTO @time, @channelindex
SET @lastchannelindex = @channelindex
WHILE @@FETCH_STATUS = 0
BEGIN
--PRINT 'hello' --'Chennel:' + CONVERT (VARCHAR(50), @channelindex,128) + ' Time:' + CONVERT (VARCHAR(50), @time,128)
IF @lastchannelindex != @channelindex
BEGIN
--PRINT 'Starting new channel:' + CONVERT (VARCHAR(50), @channelindex,128)
-- we are starting on a new channel so insert that data into the results
-- table and reset the min/max
INSERT INTO @results(case_id, channel_index, max, min) VALUES(@case_id, @lastchannelindex, @max, @min)
SET @max = null
SET @min = null
SET @lastchannelindex = @channelindex
END
Select @tmax = MAX(dms_value), @tmin = MIN(dms_value)
from continuous_data
where case_id = @case_id and channel_index = @channelindex and start_time between DATEADD(s, -(@time_limit-1), @time) and @time
HAVING SUM(value_duration) >= @time_limit
IF @@ROWCOUNT > 0
BEGIN
IF @max IS null OR @tmin > @max
BEGIN
--PRINT 'Setting max:' + CONVERT (VARCHAR(50), @tmin,128) + ' for channel:' + CONVERT (VARCHAR(50), @channelindex,128)
set @max = @tmin
END
IF @min IS null OR @tmax < @min
BEGIN
set @min = @tmax
END
END
--PRINT 'Max:' + CONVERT (VARCHAR(50), @max,128) + ' Min:' + CONVERT (VARCHAR(50), @min,128)
FETCH NEXT FROM CursorName INTO @time, @channelindex
END
CLOSE CursorName
DEALLOCATE CursorName
--PRINT 'Max:' + CONVERT (VARCHAR(50), @max,128) + ' Min:' + CONVERT (VARCHAR(50), @min,128)
SELECT * FROM @results
end
Is this a good place to use a CLR stored procedure? Any other ideas to make this a more efficient query?
EDIT 3-9-2012:
Don’t focus on the “first_time” field. It is there to make sure that the 3 minute window starts 3 minutes into the data set. In my query I don’t care about the first_time. All I need is the min/max sustained value for all 3 minute periods per channel.
Here is some sample data that contains 2 channels. Notice that the duration of each sample is not always the same:
CREATE TABLE #continuous_data
(
case_id int
, channel_index int
, start_time datetime
, dms_value float,
, value_duration smallint
)
INSERT #continuous_data VALUES (2081, 51, '2011-05-18 09:36:34.000', 90, 6)
INSERT #continuous_data VALUES (2081, 50, '2011-05-18 09:36:39.000', 94.8125, 1)
INSERT #continuous_data VALUES (2081, 50, '2011-05-18 09:36:40.000', 95.4375, 1)
INSERT #continuous_data VALUES (2081, 51, '2011-05-18 09:36:40.000', 96, 6)
INSERT #continuous_data VALUES (2081, 50, '2011-05-18 09:36:41.000', 96.75, 1)
INSERT #continuous_data VALUES (2081, 50, '2011-05-18 09:36:42.000', 98.0625, 2)
INSERT #continuous_data VALUES (2081, 50, '2011-05-18 09:36:44.000', 99.3125, 1)
INSERT #continuous_data VALUES (2081, 50, '2011-05-18 09:36:45.000', 100.625, 1)
INSERT #continuous_data VALUES (2081, 50, '2011-05-18 09:36:46.000', 101.9375, 2)
INSERT #continuous_data VALUES (2081, 51, '2011-05-18 09:36:46.000', 98, 6)
INSERT #continuous_data VALUES (2081, 50, '2011-05-18 09:36:48.000', 103.25, 1)
INSERT #continuous_data VALUES (2081, 50, '2011-05-18 09:36:49.000', 104.5625, 1)
INSERT #continuous_data VALUES (2081, 50, '2011-05-18 09:36:50.000', 105.8125, 2)
INSERT #continuous_data VALUES (2081, 50, '2011-05-18 09:36:52.000', 107.125, 1)
INSERT #continuous_data VALUES (2081, 51, '2011-05-18 09:36:52.000', 92, 6)
INSERT #continuous_data VALUES (2081, 50, '2011-05-18 09:36:53.000', 108.4375, 1)
INSERT #continuous_data VALUES (2081, 50, '2011-05-18 09:36:54.000', 109.75, 1)
INSERT #continuous_data VALUES (2081, 50, '2011-05-18 09:36:55.000', 111.0625, 2)
INSERT #continuous_data VALUES (2081, 50, '2011-05-18 09:36:57.000', 112.3125, 1)
INSERT #continuous_data VALUES (2081, 50, '2011-05-18 09:36:58.000', 113.625, 1)
INSERT #continuous_data VALUES (2081, 51, '2011-05-18 09:36:58.000', 86, 6)
INSERT #continuous_data VALUES (2081, 50, '2011-05-18 09:36:59.000', 114.9375, 2)
INSERT #continuous_data VALUES (2081, 50, '2011-05-18 09:37:01.000', 116.25, 1)
INSERT #continuous_data VALUES (2081, 50, '2011-05-18 09:37:02.000', 117.5, 1)
INSERT #continuous_data VALUES (2081, 50, '2011-05-18 09:37:03.000', 118.8125, 2)
INSERT #continuous_data VALUES (2081, 51, '2011-05-18 09:37:04.000', 80, 6)
INSERT #continuous_data VALUES (2081, 50, '2011-05-18 09:37:05.000', 120.125, 1)
INSERT #continuous_data VALUES (2081, 50, '2011-05-18 09:37:06.000', 121.4375, 1)
INSERT #continuous_data VALUES (2081, 50, '2011-05-18 09:37:07.000', 122.75, 1)
INSERT #continuous_data VALUES (2081, 50, '2011-05-18 09:37:08.000', 124.0625, 1)
If I understand correct, you want the following
For each case_id, channel_index combination:
value)
sustained value).
start_timevalue, exclude that data.There are still several unexplained differences between the Oracle query and your solution (both the stored procedure and CLR stored procedure):
first_time + numtodsinterval(3, 'minute') <= start_timeremoves the time windows before the first 3 minutes has elapsed.value_durationcolumn is in the sample data, but not used in the solutionSOLUTION
— This may not be the fastest solution, but it should work —
Step 0: Window Time Range — The sample data does not include 3 minutes of data, so I used a variable to hold the desired number of seconds for the window time range. For the actual data, you could use 180 seconds.
Step 1: First Time — Although the
first_timeisn’t important, it is still necessary to make sure we don’t include incomplete time periods. It will be used later to exclude data before the first complete time period has elapsed.Step 2: Time Windows — The Oracle query uses
partition by case_id, channel_index order by start_time range numtodsinterval(3, 'minute') preceedingto find the minimum and maximumdms_valueas well as thefirst_timein the subquery. Since SQL Server does not have therangefunctionality, you need to use a subquery to define the 3 minute windows. The Oracle query usesrange ... preceeding, so the SQL Server range will useDATEADDwith a negative value:Step 3: MIN/MAX for Time Windows — Next you need to find the minimum and maximum values for each window. This is where the majority of the calculation is performed and needs the most debugging to get the expected results.
Step 4: Finally, you can put it all together to return the lowest MAX value and highest MIN value for each time window: