I have created a table like this:
CREATE TABLE #TEMP(RecordDate datetime, First VARCHAR(255), Last VARCHAR(255), Value int)
INSERT INTO #TEMP VALUES('2011-03-01 00:00:00.000','john','smith','10')
INSERT INTO #TEMP VALUES('2011-03-01 00:00:00.000','john','adams','60')
INSERT INTO #TEMP VALUES('2011-03-01 00:00:00.000','john','resig','90')
INSERT INTO #TEMP VALUES('2011-03-01 00:00:00.000','john','balte','95')
INSERT INTO #TEMP VALUES('2011-03-01 01:00:00.000','john','smith','98')
INSERT INTO #TEMP VALUES('2011-03-01 01:00:00.000','john','adams','67')
INSERT INTO #TEMP VALUES('2011-03-01 01:00:00.000','john','resig','24')
INSERT INTO #TEMP VALUES('2011-03-01 01:00:00.000','john','balte','20')
SELECT * FROM #TEMP
DROP TABLE #TEMP
which now contains the following records:
RecordDate First Last Value
2011-03-01 00:00:00.000 john smith 10
2011-03-01 00:00:00.000 john adams 60
2011-03-01 00:00:00.000 john resig 90
2011-03-01 00:00:00.000 john balte 95
2011-03-01 01:00:00.000 john smith 98
2011-03-01 01:00:00.000 john adams 67
2011-03-01 01:00:00.000 john resig 24
2011-03-01 01:00:00.000 john balte 20
I am trying to obtain a table like the following:
RecordDate first Good Bad
2011-03-01 00:00:00.000 john 3 1
2011-03-01 01:00:00.000 john 2 2
The way I am computing Good and Bad is by taking the MAX of all people with the first name john on the specific date and then applying it as a filter on the original dataset for that particular date and first name. Only values greater than 0.5*MAXValue are considered Good.
In the result table, there are 3 good values because the maximum value for the first date was 95 and only 60,90,95 are greater than 0.5*95 so the result has (Good,Bad) = (3,1). In the second result, likewise, it is (2,2).
My table is sufficiently big and has close to 300 million records and I am not able to understand where to start to do this efficiently. Any suggestions on what an efficient way might look like?
My current (working but expensive) approach is give below:
SELECT RecordDate
, FirstName
,
(
SELECT COUNT(*)
FROM #TEMP
WHERE Value > 0.5*(SELECT MAX(Value) FROM #TEMP WHERE RecordDate = A.RecordDate AND FirstName = A.FirstName)
AND RecordDate = A.RecordDate AND FirstName = A.FirstName
) AS Good
,
(
SELECT COUNT(*)
FROM #TEMP
WHERE Value < 0.5*(SELECT MAX(Value) FROM #TEMP WHERE RecordDate = A.RecordDate AND FirstName = A.FirstName)
AND RecordDate = A.RecordDate AND FirstName = A.FirstName
) AS Bad
FROM #TEMP A
GROUP BY RecordDate, FirstName;
Here you go:
The trick is creating a derived table with the max values for each record date and then
INNER JOINit with the table itself. Once you get the max values solved, you can access them directly.Update
I see you updated your question and included the first name in the result. Never fear, here’s the solution: