I’ve been having some difficulty scaling up the application and decided to ask a

Question

0

Asked: June 16, 20262026-06-16T09:03:08+00:00 2026-06-16T09:03:08+00:00

I’ve been having some difficulty scaling up the application and decided to ask a

0

I’ve been having some difficulty scaling up the application and decided to ask a question here.

Consider a relational database (say mysql). Let’s say it allows users to make posts and these are stored in the post table (has fields: postid, posterid, data, timestamp). So, when you go to retrieve all posts by you sorted by recency, you simply get all posts with posterid = you and order by date. Simple enough.

This process will use timestamp as the index since it has the highest cardinality and correctly so. So, beyond looking into the indexes, it’ll take literally 1 row fetch from disk to complete this task. Awesome!

But let’s say it’s been 1 million more posts (in the system) by other users since you last posted. Then, in order to get your latest post, the database will peg the index on timestamp again, and it’s not like we know how many posts have happened since then (or should we at least manually estimate and set preferred key)? Then we wasted looking into a million and one rows just to fetch a single row.

Additionally, a set of posts from multiple arbitrary users would be one of the use cases, so I cannot make fields like userid_timestamp to create a sub-index.

Am I seeing this wrong? Or what must be changed fundamentally from the application to allow such operation to occur at least somewhat efficiently?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-16T09:03:09+00:00

Indexing

If you have a query: ... WHERE posterid = you ORDER BY timestamp [DESC], then you need a composite index on {posterid, timestamp}.

Finding all posts of a given user is done by a range scan on the index’s leading edge (posterid).
Finding user’s oldest/newest post can be done in a single index seek, which is proportional to the B-Tree height, which is proportional to log(N) where N is number of indexed rows.

To understand why, take a look at Anatomy of an SQL Index.

Clustering

The leafs of a “normal” B-Tree index hold “pointers” (physical addresses) to indexed rows, while the rows themselves reside in a separate data structure called “table heap”. The heap can be eliminated by storing rows directly in leafs of the B-Tree, which is called clustering. This has its pros and cons, but if you have one predominant kind of query, eliminating the table heap access through clustering is definitely something to consider.

In this particular case, the table could be created like this:

CREATE TABLE T (
    posterid int,
    `timestamp` DATETIME,
    data VARCHAR(50),
    PRIMARY KEY (posterid, `timestamp`)
);

The MySQL/InnoDB clusters all its tables and uses primary key as clustering key. We haven’t used the surrogate key (postid) since secondary indexes in clustered tables can be expensive and we already have the natural key. If you really need the surrogate key, consider making it alternate key and keeping the clustering established through the natural key.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I’ve been having some difficulty scaling up the application and decided to ask a

Leave an answerCancel reply

1 Answer

Indexing

Clustering

Leave an answer
Cancel reply