I have a log of events for when a device either starts or stops with a failure code and I’m trying to calculate the average and mean times between a failure and a start. Here’s a very simple example table of data:
+----+-----------+---------------------+
| id | eventName | eventTime |
+----+-----------+---------------------+
| 1 | start | 2012-11-01 14:25:20 |
| 2 | fail A | 2012-11-01 14:27:45 |
| 3 | start | 2012-11-01 14:30:49 |
| 4 | fail B | 2012-11-01 14:32:54 |
| 5 | start | 2012-11-01 14:35:59 |
| 6 | fail A | 2012-11-01 14:37:02 |
| 7 | start | 2012-11-01 14:38:05 |
| 8 | fail A | 2012-11-01 14:40:09 |
| 9 | start | 2012-11-01 14:41:11 |
| 10 | fail C | 2012-11-01 14:43:14 |
+----+-----------+---------------------+
Create code:
CREATE TABLE `test` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`eventName` varchar(50) NOT NULL,
`eventTime` datetime NOT NULL,
PRIMARY KEY (`id`)
);
INSERT INTO `test` (`id`, `eventName`, `eventTime`) VALUES (1,'start','2012-11-01 14:25:20'),(2,'fail A','2012-11-01 14:27:45'),(3,'start','2012-11-01 14:30:49'),(4,'fail B','2012-11-01 14:32:54'),(5,'start','2012-11-01 14:35:59'),(6,'fail A','2012-11-01 14:37:02'),(7,'start','2012-11-01 14:38:05'),(8,'fail A','2012-11-01 14:40:09'),(9,'start','2012-11-01 14:41:11'),(10,'fail C','2012-11-01 14:43:14');
I can get the times between a start and a failure using something like this:
SET @time_prev := -1;
SELECT
*
FROM
(
SELECT
eventName
, eventTime
, @ts := UNIX_TIMESTAMP(eventTime) AS ts
, @started := IF(eventName = 'start', 1, 0) AS started
, @failed := IF(eventName <> 'start', 1, 0) AS failed
, @time_diff := IF(@time_prev > -1, @ts - @time_prev, 0) AS time_diff
, @time_prev := @ts AS time_prev
, @time_to_fail := IF(@failed, @time_diff, 0) AS time_to_fail
, @time_to_start := IF(@started, @time_diff, 0) AS time_to_start
FROM
test
) AS t1;
+-----------+---------------------+------------+---------+--------+-----------+------------+--------------+---------------+
| eventName | eventTime | ts | started | failed | time_diff | time_prev | time_to_fail | time_to_start |
+-----------+---------------------+------------+---------+--------+-----------+------------+--------------+---------------+
| start | 2012-11-01 14:25:20 | 1351805120 | 1 | 0 | 0 | 1351805120 | 0 | 0 |
| fail A | 2012-11-01 14:27:45 | 1351805265 | 0 | 1 | 145 | 1351805265 | 0 | 145 |
| start | 2012-11-01 14:30:49 | 1351805449 | 1 | 0 | 184 | 1351805449 | 184 | 0 |
| fail B | 2012-11-01 14:32:54 | 1351805574 | 0 | 1 | 125 | 1351805574 | 0 | 125 |
| start | 2012-11-01 14:35:59 | 1351805759 | 1 | 0 | 185 | 1351805759 | 185 | 0 |
| fail A | 2012-11-01 14:37:02 | 1351805822 | 0 | 1 | 63 | 1351805822 | 0 | 63 |
| start | 2012-11-01 14:38:05 | 1351805885 | 1 | 0 | 63 | 1351805885 | 63 | 0 |
| fail A | 2012-11-01 14:40:09 | 1351806009 | 0 | 1 | 124 | 1351806009 | 0 | 124 |
| start | 2012-11-01 14:41:11 | 1351806071 | 1 | 0 | 62 | 1351806071 | 62 | 0 |
| fail C | 2012-11-01 14:43:14 | 1351806194 | 0 | 1 | 123 | 1351806194 | 0 | 123 |
+-----------+---------------------+------------+---------+--------+-----------+------------+--------------+---------------+
But in order to get time between failure and start I have to move ahead to the next record and lose the grouping of that failure code. How can I move this to the next level and get the future to time start merged into a failed record so it can be grouped?
Ultimately, after calculating averages and medians I would end up with a result set like this:
+-----------+-------------+----------------+--------------+-----------------+
| eventName | avg_to_fail | median_to_fail | avg_to_start | median_to_start |
+-----------+-------------+----------------+--------------+-----------------+
| fail A | 110.66 | 124.00 | 103.00 | 63.00 |
| fail B | 125.00 | 125.00 | 185.00 | 185.00 |
+-----------+-------------+----------------+--------------+-----------------+
This gives the averages
Medians are a pain in SQL. Simple way to calculate median with MySQL gives some ideas. The two inner queries give the result sets to median over were there a median aggregate.
To aid understanding I’d first consider
This is the essence of the inner query
timeswithout the group by and the min statement. You’ll see this has a row combining every start event with every end event where the end event is after the start event. Call this X.The next part is
The key here is that Min(x.endTime) combines with the group by. So we’re getting the earliest end time after the start time (as X already constrained it to be after). Although I’ve only picked out the columns we need to use, we have access to start time id, end time id start event, end event, start time, min(end time) here. The reason you can adapt it to find the avg_to_start is because we pick the interesting event name, as we have both.
SQL Fiddle: http://sqlfiddle.com/#!2/90465/6