According to MySQL docs a composite index will still be used if the leftmost fields are part of the criteria. However, this table will not join correctly with the primary key; I had to add another index of the left two fields which is then used.
One of the tables is memory, and I know that by default memory uses a hash index which can’t be used for group/order. However I’m using all rows of the memory table and not the index, so I don’t think that relates to the problem.
What am I missing?
mysql> show create table pr_temp;
| pr_temp | CREATE TEMPORARY TABLE `pr_temp` (
`player_id` int(10) unsigned NOT NULL,
`insert_date` date NOT NULL,
[...]
PRIMARY KEY (`player_id`,`insert_date`) USING BTREE,
KEY `insert_date` (`insert_date`)
) ENGINE=MEMORY DEFAULT CHARSET=utf8 |
mysql> show create table player_game_record;
| player_tank_record | CREATE TABLE `player_game_record` (
`player_id` int(10) unsigned NOT NULL,
`game_id` smallint(5) unsigned NOT NULL,
`insert_date` date NOT NULL,
[...]
PRIMARY KEY (`player_id`,`insert_date`,`game_id`),
KEY `insert_date` (`insert_date`),
KEY `player_date` (`player_id`,`insert_date`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 DATA DIRECTORY='...' INDEX DIRECTORY='...' |
mysql> explain select pgr.* from player_game_record pgr inner join pr_temp on pgr.player_id = pr_temp.player_id and pgr.insert_date = pr_temp.date_prev;
+----+-------------+---------+------+---------------------------------+-------------+---------+-------------------------------------------------------------------------+--------+-------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+---------+------+---------------------------------+-------------+---------+-------------------------------------------------------------------------+--------+-------+
| 1 | SIMPLE | pr_temp | ALL | PRIMARY | NULL | NULL | NULL | 174683 | |
| 1 | SIMPLE | pgr | ref | PRIMARY,insert_date,player_date | player_date | 7 | test_gamedb.pr_temp.player_id,test_gamedb.pr_temp.date_prev | 21 | |
+----+-------------+---------+------+---------------------------------+-------------+---------+-------------------------------------------------------------------------+--------+-------+
2 rows in set (0.00 sec)
mysql> explain select pgr.* from player_game_record pgr force index (primary) inner join pr_temp on pgr.player_id = pr_temp.player_id and pgr.insert_date = pr_temp.date_prev;
+----+-------------+---------+------+---------------+---------+---------+-------------------------------------------------------------------------+---------+-------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+---------+------+---------------+---------+---------+-------------------------------------------------------------------------+---------+-------+
| 1 | SIMPLE | pr_temp | ALL | PRIMARY | NULL | NULL | NULL | 174683 | |
| 1 | SIMPLE | pgr | ref | PRIMARY | PRIMARY | 7 | test_gamedb.pr_temp.player_id,test_gamedb.pr_temp.date_prev | 2873031 | |
+----+-------------+---------+------+---------------+---------+---------+-------------------------------------------------------------------------+---------+-------+
2 rows in set (0.00 sec)
I think the primary key should work, with the two left columns (player_id, insert_date) being used. However it will use the player_date index by default, and if I force it to use the primary index it looks like it’s only using one field rather than both.
Update2: Mysql version 5.5.27-log
Update3:
(note this is after removing the player_date index while trying some other tests)
mysql> show indexes in player_game_record;
+--------------------+------------+-------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+--------------------+------------+-------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| player_game_record | 0 | PRIMARY | 1 | player_id | A | NULL | NULL | NULL | | BTREE | | |
| player_game_record | 0 | PRIMARY | 2 | insert_date | A | NULL | NULL | NULL | | BTREE | | |
| player_game_record | 0 | PRIMARY | 3 | game_id | A | 576276246 | NULL | NULL | | BTREE | | |
| player_game_record | 1 | insert_date | 1 | insert_date | A | 33304 | NULL | NULL | | BTREE | | |
+--------------------+------------+-------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
4 rows in set (1.08 sec)
mysql> select count(*) from player_game_record;
+-----------+
| count(*) |
+-----------+
| 576276246 |
+-----------+
1 row in set (0.00 sec)
I agree that your use of the
MEMORYstorage engine for one of the tables should not at all be an issue here, since we’re talking about the other table.I also agree that the leftmost prefix of an index can be used exactly how you are trying to use it, and I cannot think of any reason why the primary key could not be used in exactly the same way as any other index.
This has been a head-scratcher. The new index you created “should” be the same as the left side of the primary key, so why don’t they behave the same way? I have two thoughts, both of which lead me to the same recommendation, even though I am not as familiar with the internals of MyISAM as I am with InnoDB. (As an aside, I’d recommend InnoDB over MyISAM.)
The index on your primary key was presumably on the table when you began inserting data, while the new index was added while most or all of the data was already there. This suggests that your new index is nice and cleanly-organized internally, while your primary key index may be highly fragmented, having been built as the data was loaded.
The row count the optimizer shows is based on index statistics, which may be inaccurate on your primary key due to the insert order.
The fragmentation theory may explain why querying with the primary key as your index is not as fast; the index statistics theory may explain why the optimizer comes up with such a different row count and it may explain why the optimizer might have been choosing a full table scan instead of using that index (which is only a guess, since we don’t have the explain available).
The thing I would suggest based on these two thoughts is running
OPTIMIZE TABLEon your table. If it took 12 hours to build that new index, then optimizing the table may very possibly take that long or longer.Possibly helpful: http://www.dbasquare.com/2012/07/09/data-fragmentation-problem-in-mysql-myisam/