How to speed up select count(*) with group by ? It’s too slow and

Question

0

Asked: May 12, 20262026-05-12T17:15:39+00:00 2026-05-12T17:15:39+00:00

How to speed up select count(*) with group by ? It’s too slow and

0

How to speed up select count(*) with group by?
It’s too slow and is used very frequently.
I have a big trouble using select count(*) and group by with a table having more than 3,000,000 rows.

select object_title,count(*) as hot_num   
from  relations 
where relation_title='XXXX'   
group by object_title

relation_title, object_title is varchar.
where relation_title=’XXXX’, which returns more than 1,000,000 rows, lead to the indexes on object_title could not work well.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-12T17:15:40+00:00

Here are several things I’d try, in order of increasing difficulty:

(easier) – Make sure you have the right covering index

CREATE INDEX ix_temp ON relations (relation_title, object_title);

This should maximize perf given your existing schema, since (unless your version of mySQL’s optimizer is really dumb!) it will minimize the amount of I/Os needed to satisfy your query (unlike if the index is in the reverse order where the whole index must be scanned) and it will cover the query so you won’t have to touch the clustered index.

(a little harder) – make sure your varchar fields are as small as possible

One of the perf challenges with varchar indexes on MySQL is that, when processing a query, the full declared size of the field will be pulled into RAM. So if you have a varchar(256) but are only using 4 chars, you’re still paying the 256-byte RAM usage while the query is being processed. Ouch! So if you can shrink your varchar limits easily, this should speed up your queries.

(harder) – Normalize

30% of your rows having a single string value is a clear cry for normalizing into another table so you’re not duplicating strings millions of times. Consider normalizing into three tables and using integer IDs to join them.

In some cases, you can normalize under the covers and hide the normalization with views which match the name of the current table… then you only need to make your INSERT/UPDATE/DELETE queries aware of the normalization but can leave your SELECTs alone.

(hardest) – Hash your string columns and index the hashes

If normalizing means changing too much code, but you can change your schema a little bit, you may want to consider creating 128-bit hashes for your string columns (using the MD5 function). In this case (unlike normalization) you don’t have to change all your queries, only the INSERTs and some of the SELECTs. Anyway, you’ll want to hash your string fields, and then create an index on the hashes, e.g.

CREATE INDEX ix_temp ON relations (relation_title_hash, object_title_hash);

Note that you’ll need to play around with the SELECT to make sure you are doing the computation via the hash index and not pulling in the clustered index (required to resolve the actual text value of object_title in order to satisfy the query).

Also, if relation_title has a small varchar size but object title has a long size, then you can potentially hash only object_title and create the index on (relation_title, object_title_hash).

Note that this solution only helps if one or both of these fields is very long relative to the size of the hashes.

Also note that there are interesting case-sensitivity/collation impacts from hashing, since the hash of a lowercase string is not the same as a hash of an uppercase one. So you’ll need to make sure you apply canonicalization to the strings before hashing them– in otherwords, only hash lowercase if you’re in a case-insensitive DB. You also may want to trim spaces from the beginning or end, depending on how your DB handles leading/trailing spaces.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

How to speed up select count(*) with group by ? It’s too slow and

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply