I have been struggling with a problem that should be pretty simple actually but after a full week of reading, googling, experimenting and so on, my colleague and we cannot find the proper solution. 🙁
The problem: We have a table with two values:
an employeenumber (P_ID, int) <— identification of employee
a date (starttime, datetime) <— time employee checked in
- We need to know what periods each employee has been working.
- When two dates are less then @gap days apart, they belong to the same period
- For each employee there can be multiple records for any given day but I just need to know which dates he worked, I am not interested in the time part
- As soon as there is a gap > @gap days, the next date is considered the start of a new range
- A range is at least 1 day (example: 21-9-2011 | 21-09-2011) but has no maximum length. (An employee checking in every @gap – 1 days should result in a period from the first day he checked in until today)
What we think we need are the islands in this table where the gap in days is greater than @variable (@gap = 30 means 30 days)
So an example:
SOURCETABLE:
P_ID | starttime
------|------------------
12121 | 24-03-2009 7:30
12121 | 24-03-2009 14:25
12345 | 27-06-2011 10:00
99999 | 01-05-2012 4:50
12345 | 27-06-2011 10:30
12345 | 28-06-2011 11:00
98765 | 13-04-2012 10:00
12345 | 21-07-2011 9:00
99999 | 03-05-2012 23:15
12345 | 21-09-2011 12:00
45454 | 12-07-2010 8:00
12345 | 21-09-2011 17:00
99999 | 06-05-2012 11:05
99999 | 20-05-2012 12:45
98765 | 26-04-2012 16:00
12345 | 07-07-2012 14:00
99999 | 01-06-2012 13:55
12345 | 13-08-2012 13:00
Now what I need as a result is:
PERIODS:
P_ID | Start | End
-------------------------------
12121 | 24-03-2009 | 24-03-2009
12345 | 27-06-2012 | 21-07-2012
12345 | 21-09-2012 | 21-09-2012
12345 | 07-07-2012 | (today) OR 13-08-2012 <-- (less than @gap days ago) OR (last date in table)
45454 | 12-07-2010 | 12-07-2010
45454 | 17-06-2012 | 17-06-2012
98765 | 13-04-2012 | 26-04-2012
99999 | 01-05-2012 | 01-06-2012
I hope this is clear this way, I already thank you for reading this far, it would be great if you could contribute!
Jon most definitively has shown us the right direction. Performance was horrible though (4million+ records in the database). And it looked like we were missing some information. With all that we learned from you we came up with the solution below. It uses elements of all the proposed answers and cycles through 3 temptables before finally spewing results but performance is good enough, as well as the data it generates.