We have a `users’ table that holds information about our users. One of the fields within this table is called ‘query’. I am trying to SELECT the user id’s of all users that have the same query. So my output should look like this:
user1_id user2_id common_query
43 2 "foo"
117 433 "bar"
1 119 "baz"
1 52 "qux"
Unfortunately, I can’t get this query to finish in under an hour (the users table is pretty big). This is my current query:
SELECT u1.id,
u2.id,
u1.query
FROM users u1
INNER JOIN users u2
ON u1.query = u2.query
AND u1.id <> u2.id
My explain:
+----+-------------+-------+-------+----------------------+----------------------+---------+---------------------------------+----------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------+----------------------+----------------------+---------+---------------------------------+----------+--------------------------+
| 1 | SIMPLE | u1 | index | index_users_on_query | index_users_on_query | 768 | NULL | 10905267 | Using index |
| 1 | SIMPLE | u2 | ref | index_users_on_query | index_users_on_query | 768 | u1.query | 11 | Using where; Using index |
+----+-------------+-------+-------+----------------------+----------------------+---------+---------------------------------+----------+--------------------------+
As you can see from the explain, the users table is indexed on query and the index appears to be being used in my SELECT. I’m wondering why the ‘rows’ column on table u2 has a value of 11, and not 1. Is there anything I can do to speed this query up? Is my ‘<>’ comparison within the join bad practice? Also, the id field is the primary key
My biggest concern is the
key_len, which indicates that MySQL must compare up to 768 bytes in order to lookup each index entry.For this query, a hash index on
querycould be much more performant (as it would involve substantially shorter comparisons, at the cost of calculating hashes and being unable to sort records using that index):You might also consider making this a composite on
(query, id)so that MySQL need not scan into the record itself to test the<>criterion.