I’ve been using a pretty simple array formula in excel to crunch some datasets but they’re getting too large and absolutely destroying my computers performance whenever I update the calculations.
The excel sheet and MySQL database are laid out like so:
+-Timestamp-+-value-+
| 1340816430| .02 |
---------------------
x600,000 rows
Here’s the excel formula:
{=AVERAGEIFS(B:B,A:A,"<"&A1+1000,A:A,">"&A1-1000)}
That returns the average of the values, and is the third column in the excel sheet. Is there any plausible way for me to create a MySQL query that performs a similar operation and returns a column with the values that would have been in the third column had I run excel’s formula?
If you are happy using Excel formulas you can speed up this calculation a lot (factor of over 3000 on my system). Assuming that Column A contains the timestamps in ASCENDING ORDER and Column B the values (if not already sorted then use Excel Sort).
in Column C put =IFERROR(MATCH(A1-1000,$A:$A,1),1) and copy down. This calculates the row number of the row 1000 timestamp less.
in Column D put =IFERROR(MATCH(A1+1000,$A:$A,1),1048576) and copy down. This calculates the row number of the row 1000 timestamp more.
in column E put =AVERAGE(OFFSET(B1,C1-ROW(),0,D1-C1+1,1)) and copy down. This calculates the average of the subset range from the first row to the last row.
On my system this full calculates 1000K rows in 20 seconds.
The disadvantage of this method is that its volatile so will recalculate whenever you make a change, but I assume that you are in Manual calculation mode anyway.