I have a report I’m rewriting for an application using MySQL as the database. Currently, the report is using a lot of grunt work coming from php, which creates arrays, re-stores them into a temp database then generates results from that temp DB.
One of the main goals from rewriting a bulk of all this code is to simplify and clean a lot of my old code and am wondering whether the below process can be simplified, or even better done solely on MySQL to let php just handle the dstribution of the data to the client.
I will use a made up scenario to describe what I am attempting to do:
Let’s assume the following table (please note in real app, this table’s information is actually pulled from several tables, but this should get the point across for clarity):
+----+-----------+--------------+--------------+
| id | location | date_visited | time_visited |
+----+-----------+--------------+--------------+
| 1 | place 1 | 2012-04-20 | 11:00:00 |
+----+-----------+--------------+--------------+
| 2 | place 2 | 2012-04-20 | 11:06:00 |
+----+-----------+--------------+--------------+
| 3 | place 1 | 2012-04-20 | 11:06:00 |
+----+-----------+--------------+--------------+
| 4 | place 3 | 2012-04-20 | 11:20:00 |
+----+-----------+--------------+--------------+
| 5 | place 2 | 2012-04-20 | 11:21:00 |
+----+-----------+--------------+--------------+
| 6 | place 1 | 2012-04-20 | 11:22:00 |
+----+-----------+--------------+--------------+
| 7 | place 3 | 2012-04-20 | 11:23:00 |
+----+-----------+--------------+--------------+
The report I need requires me to first list each location and then the number of visits made to that place. However, the caveat and what makes the query difficult for me is that there needs to be a time interval met for the visit to count whithin this report.
For example: Let’s say the interval between visits to any given place is 10 minutes.
The first entry is locked in automatically because there are no previous entries, and so is the second since there are no other entries for ‘place 2’ yet. However on the third entry, place 1 is checked for the last time it was visited, which was less than the interval defined (10 minutes), therefore the report would ignore this entry and move along to the next one.
In essence, we are checking on a case by case scenario where the time interval is not from the last entry, but from the last entry from the same location.
The results from the report should look something like this in the end:
+----+-----------+--------+
| id | location | visits |
+----+-----------+--------+
| 1 | place 1 | 2 |
+----+-----------+--------+
| 2 | place 2 | 2 |
+----+-----------+--------+
| 3 | place 3 | 1 |
+----+-----------+--------+
My current implementation on a basic level goes through the following steps to acquire the above result set:
- MySQL query creates one temp table with a list of all the required locations and their ID.
- MySQL query selects all the visit data whithin the specified time frame and passes it to PHP.
- PHP & MySQL populate the temporary table with the visits data, PHP does the grunt work here.
- MySQL selects data from temporary table and returns it to client for display.
My question is. Is there a way to do most of this with MySQL alone? What I’ve been trying to find is a way to write a MySQL query which can parse through the select statement and select only the visits which meet the above criteria and then finally groups it by location and provides me with a COUNT(*) of each group.
I really don’t know if it’s possible and am in hopes that one of the database gurus out there might be able to shed some light on how to do this.
Suppose you have a table (probably temporary) of a slightly different structure:
which, as you see, has an index on (
location,visited). Then the following query will use the index, that is read data in the order of the index, and return the results you expected:Result:
Some explanation:
The key of the solution is that it fades out the functional nature of SQL, and uses MySQL implementation specifics (they say it is bad, never do it again!!!).
If a table has an index (an ordered representation of column values) and the index is used in a query, that means that the data from the table is read in the order of the index.
GROUP BY operation will benefit from an index (since the data is already grouped there) and will choose it if it is applicable.
All aggregating functions in SQL (except for
COUNT(*)which has a special meaning) check each row, and use the value only if it is not NULL (the expression within COUNT above returns NULL for wrong conditions)The rest is just a hacky representation of procedural iteration over a list of rows (which is read in the order of the index, that is ordered by
location asc, visisted asc): I initialize some variables, if location differs from the previous row – I count it, if not – I check the interval and return NULL if it is wrong.