I have two tables, like this:
master
---------
empcode INT PRIMARY KEY
name VARCHAR
dept VARCHAR
emp_tx
----------
empcode INT references MASTER(empcode)
s_date DATETIME
The emp_tx table records the employee “in” and “out” transactions. The column s_date stores the time (as a DATETIME value) when the “in” or “out” event occurred. The transactions are recorded from the office region (through Finger Print Biometric System.)
Example data from emp_tX table:
empcode s_datetime
------- ------------------
1110 2012-12-12 09:31:42 (employee in time to the office)
1110 2012-12-12 13:34:17 (employee out time for lunch)
1110 2012-12-12 14:00:17 (employee in time after lunch)
1110 2012-12-12 18:00:12 (employee out time after working hours)
1112
etc.
Note:
If an employee is absent from the office on a given day, then no row will be inserted into the emp_tx transaction table for that date. An absence of an employee on a given date will be indicated by a row “missing” for that employee and that date.
Can anyone help me to get a SQL Query that returns the dates that employees were absent, to produce an Employee Absent Report?
The input to the query will be two DATE values, a “from” date and a “to” date, which specifies a range of dates. The query should return all occurrences of “absence” (or, non-occurrences rather, non, when no row is found in the EMP_TX table for an empcode on any date between the “from” and “to” dates.
Expected output:
If we input ‘2012-12-12’ as the “from” date, and ‘2012-12-20’ as the “to” date, the query should return rows something like this:
Empcode EmpName Department AbsentDate TotalNoofAbsent days
------- ------- ---------- ----------- --------------------
1110 ABC Accounts 2012-12-12
1110 ABC Accounts 2012-12-14 2
1112 xyz Software 2012-12-19
1112 xyz Software 2012-12-17 2
I’ve tried this query, and I am sure it is not returning the rows I want:
select tx.date from Emp_TX as tx where Date(S_Date) not between '2012-12-23' and '2012-12-30'
Thanks.
If an “absence” is defined as the non-appearance of a row in the
emp_txtable for a particularempcodefor a particular date (date=midnight to midnight 24 hour period), and …If its acceptable to not show an “absence” for a date when there are NO transactions in the
emp_txtable for that date (i.e. exclude a date when ALL empcode are absent on that date), then …You can get the first four columns of the specified result set with a query like this: (untested)
Getting that fifth column
TotalNoofAbsentreturned in the same resultset is possible, but it’s going to make that query really messy. This detail might be more efficiently handled on the client side, when processing the returned resultset.How the query works
The inline view aliased as
dgets us a set of “date” values that we are checking. Using theemp_txtable as a source of these “date” values is a convenient way to do this. Not theDATE()function is returning just the “date” portion of the DATETIME argument; we’re using aGROUP BYto get a distinct list of dates (i.e. no duplicate values). (What we’re after, with this inline view query, is a distinct set of DATE values between the two values passed in as arguments. There are other, more involved, ways of generating a list of DATE values.)As long as every “date” value that you will consider as an “absence” appears somewhere in the table (that is, at least one
empcodehad one transaction on each date that is of interest), and as long a the number of rows in theemp_txtable isn’t excessive, then the inline view query will work reasonably well.(NOTE: The query in the inline view can be run separately, to verify that the results are correct and as we expect.)
The next step is to do take the results from the inline view and perform a
CROSS JOINoperation (to generate a Cartesian product) to match EVERYempcodewith EVERYdatereturned from the inline view. The result of this operation represents every possible occurrence of “attendance”.The final step in the query is to perform an “anti-join” operation, using a
LEFT JOINand aWHERE IS NULLpredicate. TheLEFT JOIN(outer join) returns every possible attendance occurrence (from the left side), INCLUDING those that don’t have a matching row (attendance record) from theemp_txtable.The “trick” is to include a predicate (in the WHERE clause) that discards all of the rows where a matching attendance record was found, so that what we are left with is all combinations of
empcodeanddate(possible attendance occurrences) where there was NO MATCHING attendance transaction.(NOTE: I’ve purposefully left the references to the s_date (DATETIME) column “bare” in the predicates, and used range predicates. This will allow MySQL to make effective use of an appropriate index that includes that column.)
If we were to wrap the column references in the predicates inside a function e.g.
DATE(p.s_date), then MySQL won’t be able to make effective use of an index on thes_datecolumn.As one of the comments (on your question) points out, we’re not making any distinction between transactions that mark an employee either as “coming in” or “going out”. We are ONLY looking for the existence of a transaction for that empcode in a given 24-hour “midnight to midnight” period.
There are other approaches to getting the same result set, but the “anti-join” pattern usually turns out to give the best performance with large sets.
For best performance, you’ll likely want covering indexes: