Here’s a puzzler for you:
I’m keeping stats of cluster computing stuff in a MySQL table named ‘jobs’. Each job row has a host the job executed on (not unique), a job execution time in seconds, and a unique integer as the PK so I can order the completed jobs simply by ordering the PK.
As of right now, using average and group by, I can find the average execution time in seconds for each host over all of the jobs completed. Instead of averaging all the execution times per host, I want the average time of the last five jobs per host.
There’s all sorts of examples for operations and group by, and lots of examples for operations with limit, but is there any way of combining the two in a fairly straightforward MySQL query?
EDIT: In the event that I’m not clear about it, I want the average five execution times for host 1, and the average five execution times for host 2, etc.
My initial reaction was to use LIMIT to restrict the average to 5 results, which led me to suggest:
But it is clear that this limits the average to the most recent 5 jobs, and not the most recent 5 jobs per host.
It seems difficult to use LIMIT to restrict the average, without using some kind of stored procedure. This led me to consider assigning each job a per-host completion order, or position, using a mysql variable.
This is untested, but the theory it illustrates should be a good starting point:
First, we should assign each job a position based on its host:
After establishing the position, just select the aggregate function, restricting results to the top 5 positions:
Please let me know if this works for you, or if there are more aspects I have not considered. This is an intriguing problem.