I have two tables: servers and stats servers has a column called id that

Question

0

Asked: June 6, 20262026-06-06T17:44:40+00:00 2026-06-06T17:44:40+00:00

I have two tables: servers and stats servers has a column called id that

0

I have two tables: “servers” and “stats”

servers has a column called “id” that auto-increments.
stats has a column called “server” that corresponds to a row in the servers table, a column called “time” that represents the time it was added, and a column called “votes” that I would like to get the average of.

I would like to fetch all the servers (SELECT * FROM servers) along with the average votes of the 24 most recent rows that correspond to each server. I believe this is a “greatest-n-per-group” question.

This is what I tried to do, but it gave me 24 rows total, not 24 rows per group:

SELECT servers.*,
       IFNULL(AVG(stats.votes), 0) AS avgvotes
FROM servers
LEFT OUTER JOIN
  (SELECT server,
          votes
   FROM stats
   GROUP BY server
   ORDER BY time DESC LIMIT 24) AS stats ON servers.id = stats.server
GROUP BY servers.id

Like I said, I would like to get the 24 most recent rows for each server, not 24 most recent rows total.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-06T17:44:42+00:00

This is another approach.

This query is going to suffer the same performance problems as other queries here that return correct results, because the execution plan for this query is going to require a SORT operation on EVERY row in the stats table. Since there is no predicate (restriction) on the time column, EVERY row in the stats table will be considered. For a REALLY large stats table, this is going to blow out all available temporary space before it dies a horrible death. (More notes on performance below.)

SELECT r.*
     , IFNULL(s.avg_votes,0)
  FROM servers r
  LEFT 
  JOIN ( SELECT t.server
              , AVG(t.votes) AS avg_votes
           FROM ( SELECT CASE WHEN u.server = @last_server 
                           THEN @i := @i + 1
                           ELSE @i := 1 
                         END AS i
                       , @last_server := u.server AS `server`
                       , u.votes AS votes
                    FROM (SELECT @i := 0, @last_server := NULL) i
                    JOIN ( SELECT v.server, v.votes
                             FROM stats v
                            ORDER BY v.server DESC, v.time DESC
                         ) u
                ) t
          WHERE t.i <= 24
          GROUP BY t.server
       ) s
    ON s.server = r.id

What this query is doing is sorting the stats table, by server and by descending order on the time column. (Inline view aliased as u.)

With the sorted result set, we assign a row numbers 1,2,3, etc. to each row for each server. (Inline view aliased as t.)

With that result set, we filter out any rows with a rownumber > 24, and we calculate an average of the votes column for the “latest” 24 rows for each server. (Inline view aliased as s.)

As a final step, we join that to the servers table, to return the requested resultset.

NOTE:

The execution plan for this query will be COSTLY for a large number of rows in the stats table.

To improve performance, there are several approaches we could take.

The simplest might to be include in the query a predicate the EXCLUDES a significant number of rows from the stats table (e.g. rows with time values over 2 days old, or over 2 weeks old). That would significantly reduce the number of rows that need to be sorted, to determine the “latest” 24 rows.

Also, with an index on stats(server,time), it’s also possible that MySQL could do a relatively efficient “reverse scan” on the index, avoiding a sort operation.

We could also consider implementing an index on the stats table on (server,"reverse_time"). Since MySQL doesn’t yet support descending indexes, the implementation would really be a regular (ascending) index on an a derived rtime value (a “reverse time” expression that is ascending for descending values of time (for example, -1*UNIX_TIMESTAMP(my_timestamp) or -1*TIMESTAMPDIFF('1970-01-01',my_datetime).

Another approach to improve performance would be to keep a shadow table containing the most recent 24 rows for each server. That would be simplest to implement if we can guarantee that “latest rows” won’t be deleted from the stats table. We could maintain that table with a trigger. Basically, whenever a row is inserted into the stats table, we check if the time on the new rows is later than the earliest time stored for the server in the shadow table, if it is, we replace the earliest row in the shadow table with the new row, being sure to keep no more than 24 rows in the shadow table for each server.

And, yet another approach is to write a procedure or function that gets the result. The approach here would be to loop through each server, and run a separate query against the stats table to get the average votes for the latest 24 rows, and gather all of those results together. (That approach mighty really be more of a workaround to avoiding a sort on huge temporary set, just to enable the resultset to be returned, not necessarily making the return of the resultset blazingly fast.)

The bottom line for performance of this type of query on a LARGE table is restricting the number of rows considered by the query AND avoiding a sort operation on a large set. That’s how we get a query like this to perform.

ADDENDUM

To get a “reverse index scan” operation (to get the rows from stats ordered using an index WITHOUT a filesort operation), I had to specify DESCENDING on both expressions in the ORDER BY clause. The query above previously had ORDER BY server ASC, time DESC, and MySQL always wanted to do a filesort, even specifying the FORCE INDEX FOR ORDER BY (stats_ix1) hint.

If the requirement is to return an ‘average votes’ for a server only if there are at least 24 associated rows in the stats table, then we can make a more efficient query, even if it is a bit more messy. (Most of the messiness in the nested IF() functions is to deal with NULL values, which do not get included in the average. It can be much less messy if we have a guarantee that votes is NOT NULL, or if we exclude any rows where votes is NULL.)

SELECT r.*
     , IFNULL(s.avg_votes,0)
  FROM servers r
  LEFT 
  JOIN ( SELECT t.server
              , t.tot/NULLIF(t.cnt,0) AS avg_votes
           FROM ( SELECT IF(v.server = @last_server, @num := @num + 1, @num := 1) AS num
                       , @cnt := IF(v.server = @last_server,IF(@num <= 24, @cnt := @cnt + IF(v.votes IS NULL,0,1),@cnt := 0),@cnt := IF(v.votes IS NULL,0,1)) AS cnt
                       , @tot := IF(v.server = @last_server,IF(@num <= 24, @tot := @tot + IFNULL(v.votes,0)      ,@tot := 0),@tot := IFNULL(v.votes,0)      ) AS tot
                       , @last_server := v.server AS SERVER
                    -- , v.time
                    -- , v.votes
                    -- , @tot/NULLIF(@cnt,0) AS avg_sofar
                    FROM (SELECT @last_server := NULL, @num:= 0, @cnt := 0, @tot := 0) u
                    JOIN stats v FORCE INDEX FOR ORDER BY (stats_ix1)
                   ORDER BY v.server DESC, v.time DESC
                ) t
          WHERE t.num = 24
       ) s
    ON s.server = r.id

With a covering index on stats(server,time,votes), the EXPLAIN showed MySQL avoided a filesort operation, so it must have used a “reverse index scan” to return the rows in order. Absent the covering index, and index on ‘(server,time), MySQL used the index if I included an index hint, with theFORCE INDEX FOR ORDER BY (stats_ix1)` hint, MySQL avoided a filesort as well. (But since my table had less than 100 rows, I don’t think MySQL put much emphasis on avoiding a filesort operation.)

The time, votes, and avg_sofar expressions are commented out (in the inline view aliased as t); they aren’t needed, but they are for debugging.

The way that query stands, it needs at least 24 rows in stats for each server, in order to return an average. (That may be acceptable.) But I was thinking that in general, we could return a running total, total so far (tot) and a running count (cnt).

(If we replace the WHERE t.num = 24 with WHERE t.num <= 24, we can see the running average in action.)

To return the average where there aren’t at least 24 rows in stats, that’s really a matter of identifying the row (for each server) with the maximum value of num that is <= 24.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I have two tables: servers and stats servers has a column called id that

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply