I’m trying to determine the total elapsed time to complete a set of processes for a multithreaded application which keeps track of start and end times in a table. The easiest way to describe my problem is with an example.
Here’s a dumbed-down version of the table (we’ll call it processes) I’m working with:
| id | start_date | end_date |
---------------------------------------------------
| 1 | 07/15/2011 12:00:00 | 07/15/2011 12:01:00 |
| 2 | 07/15/2011 12:00:00 | 07/15/2011 12:02:00 |
| 3 | 07/15/2011 12:00:00 | 07/15/2011 12:03:00 |
| 4 | 07/15/2011 12:01:00 | 07/15/2011 12:05:00 |
| 5 | 07/15/2011 12:01:00 | 07/15/2011 12:03:00 |
| 6 | 07/15/2011 12:03:00 | 07/15/2011 12:04:00 |
| 7 | 07/15/2011 12:03:00 | 07/15/2011 12:07:00 |
| 8 | 07/15/2011 12:03:00 | 07/15/2011 12:06:00 |
| 9 | 07/15/2011 12:04:00 | 07/15/2011 12:05:00 |
| 10 | 07/15/2011 12:05:00 | 07/15/2011 12:07:00 |
| 11 | 07/15/2011 12:08:00 | 07/15/2011 12:09:00 |
With such a small sample of data it’s easy enough to visualize this (I’m assuming a thread can finish with a process and instantaneously pick up the next one with no overhead for the purposes of this question)
12:XX: | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 |
Thread1: 1---4---------------10------] 11--]
Thread2: 2-------] 6---9---]
Thread3: 3-----------7---------------]
Thread4: 5-------8-----------]
And from there you can easily tell that the total time spent working on the 11 processes was 8 minutes.
The problem arises because I am dealing with thousands of records, and there are some periods of time where no processing is happening at all.
How can I get this result using a PL/SQL query selecting from the table?
Unless I’m missing something, all you need is the difference between the lowest start date and the highest end date:
This’ll return the time elapsed in minutes.
Based on the comment, it seems that you want the sum of the time elapsed for each process? If that’s the case, it’s just a smallish variation on the earlier answer:
Obviously this answer doesn’t deal with identify which processes to get the total for, but that’s not addressed in the question at all, so I’m assuming that’s not an issue.
I see now that this doesn’t provide the answer that you’re looking for either, because it counts time worked by multiple processes multiple times. I believe the following will work. This is based on @Rajesh’s sample data from the other answer.
Basically, we’re joining each to every other row that overlaps with it and taking the earliest start time and latest end time from each of those pairings. Once we have that, we can use the distinct set to remove the duplicates, which will give us the distinct time periods.
I believe there is still one flaw here, which is cascading time periods: period A overlaps with period B and period B overlaps with period C, but period C does not overlap with period A. I think this issue is solvable, I just haven’t quite figured it out yet.
Okay, one more time: rather than joining once between the table and itself, this version uses a recursive join to follow all of the cascades to their end point.