I have a need to return a specific number of rows from a query within a given start and stop time at a dynamically calculated step interval.
I’ve kept it simple here with a table consisting of a unix timestamp and a corresponding integer value.
In my example, I need to have 200 rows returned with an INCLUSIVE start time of 1307455099 and and an INCLUSIVE end time of 1307462455.
Here’s the current query I’ve developed so far. It uses the modulus of the total rows to calculate the step interval:
SELECT timestamp, value FROM soh_data
WHERE timestamp % (CAST((1307462455 - 1307455099)/200 AS SIGNED INTEGER)) = 0
AND timestamp BETWEEN 1307455099 AND 1307462455
ORDER BY timestamp;
The first problem is that because I’m using a modulus, the start and end times aren’t always inclusive (that’s solvable with an extra query… I’m fine with that).
The second, and more difficult issue to tackle, is that the total rows returned in this case is only 196. In most queries, it’s n-1.
FYI, this is on a MySQL database with millions of rows of data.
Any insights?
Since I’m fine with throwing away a few rows, but I’m not alright with too little data, I’ve come up with two different approaches.
First: I’ve decided to adapt my query to use
FLOORinstead ofCAST. In my example, the quotient of the division was 21.805. SQL rounded that up to 22. The right step interval for gathering more than 200 results was 21 (yielding 205 results). Using FLOOR will give me the step number of 21 I need. Unfortunately, I haven’t fully tested this to ensure consistent results across larger sets:The more reliable solution is to pre-calculate the step in code. This way, I can zero in on the step programmatically. In the following example, I use Ruby for readability, but my ultimate solution will be coded in C++: