I’m working on a simple time tracking app.
I’ve created a table that logs the IN and OUT times of employees.
Here is an example of how my data currently looks:
E_ID | In_Out | Date_Time
------------------------------------
3 | I | 2012-08-19 15:41:52
3 | O | 2012-08-19 17:30:22
1 | I | 2012-08-19 18:51:11
3 | I | 2012-08-19 18:55:52
1 | O | 2012-08-19 20:41:52
3 | O | 2012-08-19 21:50:30
Im trying to create a query that will pair the IN and OUT times of an employee into one row like this:
E_ID | In_Time | Out_Time
------------------------------------------------
3 | 2012-08-19 15:41:52 | 2012-08-19 17:30:22
3 | 2012-08-19 18:55:52 | 2012-08-19 21:50:30
1 | 2012-08-19 18:51:11 | 2012-08-19 20:41:52
I hope I’m being clear in what I’m trying to achieve here.
Basically I want to generate a report that had both the in and out time merged into one row.
Any help with this would be greatly appreciated.
Thanks in advance.
There are three basic approaches I can think of.
One approach makes use of MySQL user variables, one approach uses a theta JOIN, another uses a subquery in the SELECT list.
theta-JOIN
One approach is to use a theta-JOIN. This approach is a generic SQL approach (no MySQL specific syntax), which can work with multiple RDBMS.
N.B. With a large number of rows, this approach can create a significantly large intermediate result set, which can lead to problematic performance.
What this does is match every ‘O’ row for an employee with every ‘I’ row that is earlier, and then we use the MAX aggregate to pick out the ‘I’ record with the closest date time.
This works for perfectly paired data; could produce odd results for imperfect pairs… (two consecutive ‘O’ records with no intermediate ‘I’ row, will both get matched to the same ‘I’ row, etc.)
correlated subquery in SELECT list
Another approach is to use a correlated subquery in the SELECT list. This can have sub-optimal performance, but is sometimes workable (and is occasionally the fastest way to return the specified result set… this approach works best when we have a limited number of rows returned in the outer query.)
User variables
Another approach is to make use of MySQL user variables. (This is a MySQL-specific approach, and is a workaround to the “missing” analytic functions.)
What this query does is order all of the rows by e_id, then by date_time, so we can process them in order. Whenever we encounter an ‘O’ (out) row, we use the value of date_time from the immediately preceding ‘I’ row as the ‘in_time’)
N.B.: This usage of MySQL user variables is dependent on MySQL performing operations in a specific order, a predictable plan. The use of the inline views (or “derived tables”, in MySQL parlance) gets us a predictable execution plan. But this behavior is subject to change in future releases of MySQL.
This works for the set of data you have, it needs more thorough testing and tweaking to ensure you get the result set you want with quirky data, when the rows are not perfectly paired (e.g. two ‘O’ rows with no ‘I’ row between them, an ‘I’ row with no subsequent ‘O’ row, etc.)
SQL Fiddle