I have an sql query that counts the number of results for a complex query. The actual select query is very fast when limiting to 20 results, but the count version takes about 4.5 seconds on my current tables after lots of optimizing.
If I remove the two joins and where clauses on site tags and gallery tags, the query performs at 1.5 seconds. If I create 3 separate queries – one to select the pay sites, one to select the names and one to pull everything together – I can get the query down to .6 seconds, which is still not good enough. This would also force me to use a stored procedure since I will have to make a total of 4 queries in Hibernate.
For the query “as is”, here is some info:
The Handler_read_key is 1746669
The Handler_read_next is 1546324
The gallery table has 40,000 rows
The site table has 900 rows
The name table has 800 rows
The tag table has 3560 rows
I’m pretty new to MySQL and tuning, and I have indexes on the:
- ‘term’ column in the tag table
- ‘published’ column in the gallery table
- ‘value’ for the name table
I am looking to get this query to 0.1 milliseconds.
SELECT count(distinct gallery.id)
from gallery gallery
inner join
site site
on gallery.site_id = site.id
inner join
site_to_tag p2t
on site.id = p2t.site_id
inner join
tag site_tag
on p2t.tag_id = site_tag.id
inner join
gallery_to_name g2mn
on gallery.id = g2mn.gallery_id
inner join
name name
on g2mn.name_id = name.id
inner join
gallery_to_tag g2t
on gallery.id = g2t.gallery_id
inner join
tag tag
on g2t.tag_id = tag.id
where
gallery.published = true and (
name.value LIKE 'sometext%' or
tag.term = 'sometext' or
site.`name` like 'sometext%' or
site_tag.term = 'sometext'
)
Explain Data:
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+--------------+--------+-------------------------------------------------------------------+--------------------+---------+-------------------------------------------+------+------------------------------------+
| 1 | SIMPLE | site | index | PRIMARY,nameIndex | nameIndex | 258 | NULL | 950 | Using index; Using temporary |
| 1 | SIMPLE | gallery | ref | PRIMARY,publishedIndex,FKF44C775296EECE37,publishedSiteIdIndex | FKF44C775296EECE37 | 9 | production.site.id | 20 | Using where |
| 1 | SIMPLE | g2mn | ref | PRIMARY,FK3EFFD7F8AFAD7A5E,FK3EFFD7F832C04188 | FK3EFFD7F8AFAD7A5E | 8 | production.gallery.id | 1 | Using index; Distinct |
| 1 | SIMPLE | name | eq_ref | PRIMARY,valueIndex | PRIMARY | 8 | production.g2mn.name_id | 1 | Distinct |
| 1 | SIMPLE | g2t | ref | PRIMARY,FK3DDB4D63AFAD7A5E,FK3DDB4D63E210FBA6 | FK3DDB4D63AFAD7A5E | 8 | production.g2mn.gallery_id | 2 | Using where; Using index; Distinct |
| 1 | SIMPLE | tag | eq_ref | PRIMARY,termIndex | PRIMARY | 8 | production.g2t.tag_id | 1 | Distinct |
| 1 | SIMPLE | p2t | ref | PRIMARY,FK29424AB796EECE37,FK29424AB7E210FBA6 | PRIMARY | 8 | production.gallery.site_id | 3 | Using where; Using index; Distinct |
| 1 | SIMPLE | site_tag | eq_ref | PRIMARY,termIndex | PRIMARY | 8 | production.p2t.tag_id | 1 | Using where; Distinct |
+----+-------------+--------------+--------+-------------------------------------------------------------------+--------------------+---------+-------------------------------------------+------+------------------------------------+
Individual Count Speeds:
[SQL] select count(*) from gallery;
Affected rows: 0
Time: 0.014ms
Results: 40385
[SQL]
select count(*) from gallery_to_name;
Affected rows: 0
Time: 0.012ms
Results: 35615
[SQL]
select count(*) from gallery_to_tag;
Affected rows: 0
Time: 0.055ms
Results: 165104
[SQL]
select count(*) from tag;
Affected rows: 0
Time: 0.002ms
Results: 3560
[SQL]
select count(*) from site;
Affected rows: 0
Time: 0.001ms
Results: 901
[SQL]
select count(*) from site_to_tag;
Affected rows: 0
Time: 0.003ms
Results: 7026
I’ve included my test schema and a script to produce test data at the end of this post. I have used the
SQL_NO_CACHEoption to prevent MySQL from caching query results – this is just for testing and should ultimately be removed.This is a similar idea to that proposed by Donnie, but I have tidied it up a little. If I have understood the joins correctly, there is no need to repeat all the joins in each select, as each is effectively independent from the others. The original
WHEREclause stipulates thatgallery.publishedmust be true and then follows with a series of 4 conditions joined byOR. Each query can therefore be executed separately. Here are the four joins:Because
gallerycontainssite_id, in this case, there’s no need for the intermediate join via thesitetable. The last join can therefore be reduced to this:Running each
SELECTseparately, and usingUNIONto combine the results, is very fast. The results here assume the table structures and indexes shown at the end of this post:The speed does vary depending on the search criteria. In the following example, a different search value is used for each table, and the LIKE operator has to do a little more work, as there are now more potential matches for each:
These results compare favourably with the a query which uses multiple joins:
SCHEMA
The indexes on id columns plus
site.name,name.valueandtag.termare important:TEST DATA
This populates
sitewith 900 rows,tagwith 3560 rows,namewith 800 rows andgallerywith 40,000 rows, and inserts entries into the link tables: