Problem – Retrieve sum of subtotals on a half hour interval efficiently
I am using MySQL and I have a table containing subtotals with different times. I want to retrieve the sum of these sales on a half hour interval from 7 am through 12 am. My current solution (below) works but takes 13 seconds to query about 150,000 records. I intend to have several million records in the future and my current method is too slow.
How I can make this more efficient or if possible replace the PHP component with pure SQL? Also, would it help your solution to be even more efficient if I used Unix timestamps instead of having a date and time column?
Table Name – Receipts
subtotal date time sale_id
--------------------------------------------
6 09/10/2011 07:20:33 1
5 09/10/2011 07:28:22 2
3 09/10/2011 07:40:00 3
5 09/10/2011 08:05:00 4
8 09/10/2011 08:44:00 5
...............
10 09/10/2011 18:40:00 6
5 09/10/2011 23:05:00 7
Desired Result
An array like this:
- Half hour 1 ::: (7:00 to 7:30) => Sum of Subtotal is 11
- Half hour 2 ::: (7:30 to 8:00) => Sum of Subtotal is 3
- Half hour 3 ::: (8:00 to 8:30) => Sum of Subtotal is 5
- Half hour 4 ::: (8:30 to 9:00) => Sum of Subtotal is 8
Current Method
The current way uses a for loop which starts at 7 am and increments 1800 seconds, equivalent to a half hour. As a result, this makes about 34 queries to the database.
for($n = strtotime("07:00:00"), $e = strtotime("23:59:59"); $n <= $e; $n += 1800) {
$timeA = date("H:i:s", $n);
$timeB = date("H:i:s", $n+1799);
$query = $mySQL-> query ("SELECT SUM(subtotal)
FROM Receipts WHERE time > '$timeA'
AND time < '$timeB'");
while ($row = $query-> fetch_object()) {
$sum[] = $row;
}
}
Current Output
Output is just an array where:
- [0] represents 7 am to 7:30 am
- [1] represents 7:30 am to 8:00 am
-
[33] represents 11:30 pm to 11:59:59 pm.
array (“0” => 10000,
“1” => 20000,
…………..
“33” => 5000);
You can try this single query as well, it should return a result set with the totals in 30 minute groupings:
To run this efficiently, add a composite index on the date and time columns.
You should get back a result set like: