I want grouped ranking on a very large table, I’ve found a couple of

Question

0

Asked: May 11, 20262026-05-11T06:44:26+00:00 2026-05-11T06:44:26+00:00

I want grouped ranking on a very large table, I’ve found a couple of

0

I want grouped ranking on a very large table, I’ve found a couple of solutions for this problem e.g. in this post and other places on the web. I am, however, unable to figure out the worst case complexity of these solutions. The specific problem consists of a table where each row has a number of points and a name associated. I want to be able to request rank intervals such as 1-4. Here are some data examples:

name | points Ab     14 Ac     14 B      16 C      16 Da     15 De     13

With these values the following ‘ranking’ is created:

Query id | Rank | Name 1          1      B 2          1      C 3          3      Da 4          4      Ab 5          4      Ac 6          6      De

And it should be possible to create the following interval on query-id’s: 2-5 giving rank: 1,3,4 and 4.

The database holds about 3 million records so if possible I want to avoid a solution with complexity greater than log(n). There are constantly updates and inserts on the database so these actions should preferably be performed in log(n) complexity as well. I am not sure it’s possible though and I’ve tried wrapping my head around it for some time. I’ve come to the conclusion that a binary search should be possible but I haven’t been able to create a query that does this. I am using a MySQL server.

I will elaborate on how the pseudo code for the filtering could work. Firstly, an index on (points, name) is needed. As input you give a fromrank and a tillrank. The total number of records in the database is n. The pseudocode should look something like this:

Find median point value, count rows less than this value (the count gives a rough estimate of rank, not considering those with same amount of points). If the number returned is greater than the fromrank delimiter, we subdivide the first half and find median of it. We keep doing this until we are pinpointed to the amount of points where fromrank should start. then we do the same within that amount of points with the name index, and find median until we have reached the correct row. We do the exact same thing for tillrank.

The result should be log(n) number of subdivisions. So given the median and count can be made in log(n) time it should be possible to solve the problem in worst case complexity log(n). Correct me if I am wrong.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

score 0 · Answer 1 · 2026-05-11T06:44:26+00:00

You need a stored procedure to be able to call this with parameters:

CREATE TABLE rank (name VARCHAR(20) NOT NULL, points INTEGER NOT NULL);  CREATE INDEX ix_rank_points ON rank(points, name);  CREATE PROCEDURE prc_ranks(fromrank INT, tillrank INT) BEGIN   SET @fromrank = fromrank;   SET @tillrank = tillrank;   PREPARE STMT FROM   '   SELECT  rn, rank, name, points   FROM  (     SELECT  CASE WHEN @cp = points THEN @rank ELSE @rank := @rn + 1 END AS rank,             @rn := @rn + 1 AS rn,             @cp := points,             r.*     FROM (          SELECT @cp := -1, @rn := 0, @rank = 1          ) var,          (          SELECT *          FROM rank          FORCE INDEX (ix_rank_points)          ORDER BY            points DESC, name DESC          LIMIT ?          ) r     ) o   WHERE rn >= ?   ';   EXECUTE STMT USING @tillrank, @fromrank; END;  CALL prc_ranks (2, 5);

If you create the index and force MySQL to use it (as in my query), then the complexity of the query will not depend on the number of rows at all, it will depend only on tillrank.

It will actually take last tillrank values from the index, perform some simple calculations on them and filter out first fromrank values.

Time of this operation, as you can see, depends only on tillrank, it does not depend on how many records are there.

I just checked in on 400,000 rows, it selects ranks from 5 to 100 in 0,004 seconds (that is, instantly)

Important: this only works if you sort on names in DESCENDING order. MySQL does not support DESC clause in the indices, that means that the points and name must be sorted in one order for INDEX SORT to be usable (either both ASCENDING or both DESCENDING). If you want fast ASC sorting by name, you will need to keep negative points in the database, and change the sign in the SELECT clause.

You may also remove name from the index at all, and perform a final ORDER‘ing without using an index:

CREATE INDEX ix_rank_points ON rank(points);  CREATE PROCEDURE prc_ranks(fromrank INT, tillrank INT) BEGIN   SET @fromrank = fromrank;   SET @tillrank = tillrank;   PREPARE STMT FROM   '   SELECT  rn, rank, name, points   FROM  (     SELECT  CASE WHEN @cp = points THEN @rank ELSE @rank := @rn + 1 END AS rank,             @rn := @rn + 1 AS rn,             @cp := points,             r.*     FROM (          SELECT @cp := -1, @rn := 0, @rank = 1          ) var,          (          SELECT *          FROM rank          FORCE INDEX (ix_rank_points)          ORDER BY            points DESC          LIMIT ?          ) r     ) o   WHERE rn >= ?   ORDER BY rank, name   ';   EXECUTE STMT USING @tillrank, @fromrank; END;

That will impact performance on big ranges, but you will hardly notice it on small ranges.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I want grouped ranking on a very large table, I’ve found a couple of

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply