I am not sure on how to make a decent index that will capture category/log_code properly. Maybe I also need to change my query? Appreciate any input!
All SELECTS contain:
SELECT logentry_id, date, log_codes.log_desc FROM log_entries
INNER JOIN log_codes ON log_entries.log_code = log_codes.log_code
ORDER BY logentry_id DESC
Query can be as above, but usually has a WHERE to specify the category of log_codes to show, and/or partner, and/or customer. Examples of WHERE:
WHERE partner_id = 1
WHERE log_codes.category_overview = 1
WHERE partner_id = 1 AND log_codes.category_overview = 1
WHERE partner_id = 1 AND customer_id = 1 AND log_codes.category_overview = 1
Database structure:
CREATE TABLE IF NOT EXISTS `log_codes` (
`log_code` smallint(6) NOT NULL,
`log_desc` varchar(255),
`category_mail` tinyint(1) NOT NULL,
`category_overview` tinyint(1) NOT NULL,
`category_cron` tinyint(1) NOT NULL,
`category_documents` tinyint(1) NOT NULL,
`category_error` tinyint(1) NOT NULL,
PRIMARY KEY (`log_code`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8;
CREATE TABLE IF NOT EXISTS `log_entries` (
`logentry_id` int(11) NOT NULL AUTO_INCREMENT,
`date` datetime NOT NULL,
`log_code` smallint(6) NOT NULL,
`partner_id` int(11) NOT NULL,
`customer_id` int(11) NOT NULL,
PRIMARY KEY (`logentry_id`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 ;
EDIT: Added indexes on fields, here is output of SHOW INDEXES:
+-----------+------------+-----------------------+--------------+-----------------------+-----------+-------------+----------+--------+------+------------+---------+---------------+ | Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment | +-----------+------------+-----------------------+--------------+-----------------------+-----------+-------------+----------+--------+------+------------+---------+---------------+ | log_codes | 0 | PRIMARY | 1 | log_code | A | 97 | NULL | NULL | | BTREE | | | | log_codes | 1 | category_mail | 1 | category_mail | A | 1 | NULL | NULL | | BTREE | | | | log_codes | 1 | category_overview | 1 | category_overview | A | 1 | NULL | NULL | | BTREE | | | | log_codes | 1 | category_cron | 1 | category_cron | A | 1 | NULL | NULL | | BTREE | | | | log_codes | 1 | category_documents | 1 | category_documents | A | 1 | NULL | NULL | | BTREE | | | | log_codes | 1 | category_error | 1 | category_error | A | 1 | NULL | NULL | | BTREE | | | +-----------+------------+-----------------------+--------------+-----------------------+-----------+-------------+----------+--------+------+------------+---------+---------------+ +-------------+------------+--------------+--------------+--------------+-----------+-------------+----------+--------+------+------------+---------+---------------+ | Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment | +-------------+------------+--------------+--------------+--------------+-----------+-------------+----------+--------+------+------------+---------+---------------+ | log_entries | 0 | PRIMARY | 1 | logentry_id | A | 163020 | NULL | NULL | | BTREE | | | | log_entries | 1 | log_code | 1 | log_code | A | 90 | NULL | NULL | | BTREE | | | | log_entries | 1 | partner_id | 1 | partner_id | A | 6 | NULL | NULL | YES | BTREE | | | | log_entries | 1 | customer_id | 1 | customer_id | A | 20377 | NULL | NULL | YES | BTREE | | | +-------------+------------+--------------+--------------+--------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
EDIT 2: Added composite indexes: (log_code, category_overview) and (log_code, category_overview) on log_codes. (customer_id, partner_id) on log_entries.
Here are some EXPLAIN output (query returns 66818 rows):
EXPLAIN SELECT log_entries.logentry_id, log_entries.date, log_codes.log_code_desc FROM log_entries
INNER JOIN log_codes ON log_entries.log_code = log_codes.log_code
WHERE log_entries.partner_id = 1 AND log_codes.category_overview = 1 ORDER BY logentry_id DESC
+----+-------------+-------------+--------+-------------------------------------+------------+---------+----------------------+--------+-----------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-------------+--------+-------------------------------------+------------+---------+----------------------+--------+-----------------------------+ | 1 | SIMPLE | log_entries | ref | log_code,partner_id | partner_id | 2 | const | 156110 | Using where; Using filesort | | 1 | SIMPLE | log_codes | eq_ref | PRIMARY,code_overview,overview_code | PRIMARY | 2 | log_entries.log_code | 1 | Using where | +----+-------------+-------------+--------+-------------------------------------+------------+---------+----------------------+--------+-----------------------------+
But I also have some LEFT JOINs that I did not think would affect the index design, but they cause a “Using temporary” problem. Here is EXPLAIN output (query returns 66818 rows):
EXPLAIN SELECT log_entries.logentry_id, log_entries.date, log_codes.log_code_desc FROM log_entries
INNER JOIN log_codes ON log_entries.log_code = log_codes.log_code
LEFT JOIN partners ON log_entries.partner_id = partners.partner_id
LEFT JOIN joined_table1 ON log_entries.t1_id = joined_table1.t1_id
LEFT JOIN joined_table2 ON log_entries.t2_id = joined_table2.t2_id
LEFT JOIN joined_table3 ON log_entries.t3_id = joined_table3.t3_id
LEFT JOIN joined_table4 ON joined_table3.t4_id = joined_table4.t4_id
LEFT JOIN joined_table5 ON log_entries.t5_id = joined_table5.t5_id
LEFT JOIN joined_table6 ON log_entries.t6_id = joined_table6.t6_id
WHERE log_entries.partner_id = 1 AND log_codes.category_overview = 1 ORDER BY logentry_id DESC;
+----+-------------+---------------+--------+-------------------------------------+---------------+---------+--------------------------+------+----------------------------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+---------------+--------+-------------------------------------+---------------+---------+--------------------------+------+----------------------------------------------+ | 1 | SIMPLE | log_codes | ref | PRIMARY,code_overview,overview_code | overview_code | 1 | const | 54 | Using where; Using temporary; Using filesort | | 1 | SIMPLE | log_entries | ref | log_code,partner_id | log_code | 2 | log_codes.log_code | 1811 | Using where | | 1 | SIMPLE | partners | const | PRIMARY | PRIMARY | 2 | const | 1 | Using index | | 1 | SIMPLE | joined_table1 | eq_ref | PRIMARY | PRIMARY | 1 | log_entries.t1_id | 1 | Using index | | 1 | SIMPLE | joined_table2 | eq_ref | PRIMARY | PRIMARY | 1 | log_entries.t2_id | 1 | Using index | | 1 | SIMPLE | joined_table3 | eq_ref | PRIMARY | PRIMARY | 3 | log_entries.t3_id | 1 | | | 1 | SIMPLE | joined_table4 | eq_ref | PRIMARY | PRIMARY | 3 | joined_table3.t4_id | 1 | Using index | | 1 | SIMPLE | joined_table5 | eq_ref | PRIMARY | PRIMARY | 4 | log_entries.t5_id | 1 | Using index | | 1 | SIMPLE | joined_table6 | eq_ref | PRIMARY | PRIMARY | 4 | log_entries.t6_id | 1 | Using index | +----+-------------+---------------+--------+-------------------------------------+---------------+---------+--------------------------+------+----------------------------------------------+
Don’t know if it’s a good or bad idea, but a subquery seems to get rid of the “Using temporary”. Here is EXPLAIN output of two common scenarios. This query returns 66818 rows:
EXPLAIN SELECT log_entries.logentry_id, log_entries.date, log_codes.log_code_desc FROM log_entries INNER JOIN log_codes ON log_entries.log_code = log_codes.log_code
WHERE log_entries.partner_id = 1
AND log_entries.log_code IN (SELECT log_code FROM log_codes WHERE category_overview = 1) ORDER BY logentry_id DESC;
+----+--------------------+-------------+-----------------+-------------------------------------+------------+---------+----------------------+--------+-----------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+--------------------+-------------+-----------------+-------------------------------------+------------+---------+----------------------+--------+-----------------------------+ | 1 | PRIMARY | log_entries | ref | log_code,partner_id | partner_id | 2 | const | 156110 | Using where; Using filesort | | 1 | PRIMARY | log_codes | eq_ref | PRIMARY,code_overview | PRIMARY | 2 | log_entries.log_code | 1 | | | 2 | DEPENDENT SUBQUERY | log_codes | unique_subquery | PRIMARY,code_overview,overview_code | PRIMARY | 2 | func | 1 | Using where | +----+--------------------+-------------+-----------------+-------------------------------------+------------+---------+----------------------+--------+-----------------------------+
And a overview on customer, query returns 12 rows:
EXPLAIN SELECT log_entries.logentry_id, log_entries.date, log_codes.log_code_desc FROM log_entries INNER JOIN log_codes ON log_entries.log_code = log_codes.log_code
WHERE log_entries.partner_id = 1 AND log_entries.customer_id = 10000
AND log_entries.log_code IN (SELECT log_code FROM log_codes WHERE category_overview = 1) ORDER BY logentry_id DESC;
+----+--------------------+-------------+-----------------+--------------------------------------------------+--------------+---------+----------------------+------+-----------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+--------------------+-------------+-----------------+--------------------------------------------------+--------------+---------+----------------------+------+-----------------------------+ | 1 | PRIMARY | log_entries | ref | log_code,partner_id,customer_id,customer_partner | customer_id | 4 | const | 27 | Using where; Using filesort | | 1 | PRIMARY | log_codes | eq_ref | PRIMARY,code_overview | PRIMARY | 2 | log_entries.log_code | 1 | | | 2 | DEPENDENT SUBQUERY | log_codes | unique_subquery | PRIMARY,code_overview,overview_code | PRIMARY | 2 | func | 1 | Using where | +----+--------------------+-------------+-----------------+--------------------------------------------------+--------------+---------+----------------------+------+-----------------------------+
Well done for taking the time to update your question with the detail requested. I am sorry if that sounds patronising but it is amazing the number people who are not prepared to take the time to help themselves.
Adding a composite index across (customer_id, partner_id) on the log_entries table should give a significant benefit for the last of your example where clauses.
The output of your SHOW INDEXES for the log_codes table would suggest that it is not currently populated as it shows NULL for all but the PK. Is this the case?
EDIT Sorry. Just read your comment to KAJ’s answer detailing table content. It might be worth running that SHOW INDEXES statement again as it looks like MySQL may have been building its stats.
Adding a composite index across (log_code, category_overview) for the log_codes table should help but you will need to check the explain output to see if it is being used.
As a very crude general rule you want to create composite indices starting with the columns with the highest cardinality but this is not always the case. It will depend heavily on data distribution and query structure.
UPDATE I have created a mockup of your dataset and added the following indices. They give significant improvement based on your sample WHERE clauses –
The last index is quite expensive in terms of disk space and degradation of insert performance but gives very fast SELECT based on your final WHERE clause example. My sample dataset has just over 1M records in the log_entries table with quite even distribution across the partner and customer IDs. Three of your sample WHERE clauses execute in less than a second but the one with category_overview as the only criterion is very slow although still sub-second with only 200k rows.