I’ve met an interesting problem.
I have a table of workers’ ids’ and days of their visits. Here is dump:
CREATE TABLE `pp` (
`id` int(11) DEFAULT '1',
`day` int(11) DEFAULT '1',
`key` varchar(45) NOT NULL,
PRIMARY KEY (`key`)
)
INSERT INTO `pp` VALUES
(1,1,'1'),
(1,20,'2'),
(1,50,'3'),
(1,70,'4'),
(2,1,'5'),
(2,120,'6'),
(2,90,'7'),
(1,90,'8'),
(2,100,'9');
So I need to find workers which have missed more than 50 days at least once. For example, if worker visited at 5th, 95th, 96th, 97th day, if we look at deltas, we can see that the largest delta (90) is more than 50, so we should include this worker into result.
The problem is how do I efficiently find deltas between visits of different workers?
I can’t even imagine how to work with mysql tables as consequent arrays of data.
So we need to separate day values for different workers, sort them and then find max deltas for each. But how? Is there any way to, for example, enumerate sorted arrays in sql?
This is a way I used to cope with such problems:
(EDIT this version is slightly better)
This finds all the IDs for which there is a delta > 50. (I assumed that this is what you’re after)
To see it working: SQL fiddle
To find the max deltas:
The logic behind is to find the “next” item, whatever that means. As this is an ordered attribute, the next item can be defined as having the lowest value among those rows that have the value larger than the one examined… Then you join the “next” values to the original values, conpute the delta, and return only those that are applicable. If you need the other columns too, just do a JOIN on the outer select to the original table.
I’m not sure if this is the best solution regarding perfirmance, but I only wrote queries for one-off reports, with which I could afford the query to run for a while.
There is one semantic error though, that can arise: if somebody was present on the 1st, 2nd and 3rd days, but never after, this does not find the absence. To overcome this, you could add a special row with
UNIONing a select to the tables specifying tomorrow’s day count for allIDs, but that would make this query disgusting enough not to try writing it down…