Are you running on OS 3.0? I saw the same…

Question

0

Asked: May 11, 20262026-05-11T22:15:28+00:00 2026-05-11T22:15:28+00:00

Let’s say I have a large database that consists of products in groups. Let’s

0

Let’s say I have a large database that consists of products in groups. Let’s say that there are 5 groups, each of them has 100,000 products. the product ids are random integers (so are the group ids)

I need to find a product in a specific group. My question is which primary key is more efficient:

(sid, pid)
(pid, sid)

sid, pid is intuitive, but when searching in this order, MySQL will have to isolate 100,000 out of the 500,000 rows and then find a single number in 100,000. On the other hand, (pid, sid) sounds more optimal to me since it will force mysql not to create the large 100,000 group in the first stage, but to go directly to the right item (or up to 5 items if there are similar pids in different cids).

Is #2 indeed faster?

UPDATE:
OK. I copied a real table to two copies. table0 has primary key sid,pid. table1 has pid,sid.

result of query:

explain select * from items0 where sid = 22746 and pid = 2109418034
1, ‘SIMPLE’, ‘items0’, ‘ref’, ‘PRIMARY’, ‘PRIMARY’, ‘8’, ‘const,const’, 14, ”

explain select * from items1 where sid = 22746 and pid = 2109418034

1, ‘SIMPLE’, ‘items1’, ‘ref’, ‘PRIMARY’, ‘PRIMARY’, ‘8’, ‘const,const’, 11, ”

Yet another update:
I also added the two keys to the same table and run explain. got this:
(Primary starts with sid_pid1, Index2 starts with pid1,sid)

1, ‘SIMPLE’, ‘items’, ‘ref’, ‘PRIMARY,index_2’, ‘index_2’, ‘8’, ‘const,const’, 13, ”

I’m not sure, what conclusions can I draw from this test?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-11T22:15:28+00:00

The performance of a SQL DBMS query depends GREATLY on a large number of factors – how fragmented the table (or index) is, the freshness and amount of data/index statistics, the size of your data caches/how much CPU/memory, how many rows are in the table, the query construction, etc. etc. etc.

Although profiling queries is a necessary part of performance tuning it alone is not sufficient — it must be part of a larger query optimization strategy. Saying “test it and see” is not very helpful (and in my opinion sometimes dangerous!) in the general case because of the non-deterministic nature of the query optimization process. One day running it can be just fine, the next slow (or vice versa).

Without an understanding of the fundamentals of MySQL index construction, what queries will be used, and how queries will use indexes any ad hoc tests are in the best case lucky guesses and in the worst case ticking time bombs.

In this case there IS a rule of thumb due to the nature of how MySQL B-Trees are constructed. From the MySQL internals page: http://forge.mysql.com/wiki/MySQL_Internals_MyISAM#The_.MYI_file you can see that in the case of a non-unique BTREE index on two columns MySQL will store the concatenated values in the order that you specify. In that specific example they stored ASCII (or UNICODE) but in the case of integer values it will do something similar (open a hex editor and decode the actual values if you are intrepid enough!) ( also ref’d here http://dev.mysql.com/doc/refman/5.0/en/multiple-column-indexes.html ).

So, the rule of thumb is to put the most selective ( ref http://www.akadia.com/services/ora_index_selectivity.html ) value first because that gives the query processor the most information to narrow down the # of rows to be processed. Placing a less selective key FIRST will force the optimizer to consider more rows and, unless that is what you EXACTLY want, will be suboptimal by design.

Also to piggy back on what Eric said: MySQL (or other DBMS’) can use any/all keys in increasing fashion to help narrow down the search — e.g. if you place an index on( A, B, C ) then queries that have WHERE A = .. B = can use it (depending), queries that use WHERE A = can use it, but queries that ask for WHERE C = cannot (usually).

So, it also depends on the nature of your queries — if you always ask for WHERE pid = AND sid = then the most selective one should go first (product ID) but if you often ask for WHERE sid = XXXX by itself, then the sid should go first (OR just create another index for that situation if there’s varying amounts). The trade-off here is for time/space — having an additional index will satisfy a different class of queries at the expense of additional disk space and increased write I/O.

Finally, if you are using INNODB you can specify a “clustered” index that actually sorts rows on disk (MyISAM tables are basically heaps). If you cluster the rows on disk by sid, pid then it will actually group them together so you can fetch entire BLOCKS (or pages) of products at a time which will use vastly less I/O than BTREEs alone (ref http://www.xaprb.com/blog/2006/07/04/how-to-exploit-mysql-index-optimizations/ )

So, you can see why “test it and see” is useful but without an understanding of MySQL index fundamentals you miss out on a whole class of optimizations.

How to approach applying for a job at a company ...

How to handle personal stress caused by utterly incompetent and ...

What is a programmer’s life like?

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions