I have the following MySQL query:
SELECT pool.username
FROM pool
LEFT JOIN sent ON pool.username = sent.username
AND sent.campid = 'YA1LGfh9'
WHERE sent.username IS NULL
AND pool.gender = 'f'
AND (`location` = 'united states' OR `location` = 'us' OR `location` = 'usa');
The problem is that the pool table contains millions of rows and this query takes over 12 minutes to complete. I realize that in this query, the entire left table (pool) is being scanned. The pool table has an auto incremented id row.
I would like to split this query into multiple queries so that rather than scanning the entire pool table I scan 1000 rows at a time and in the next query I would pick up where I left off (1000-2000,2000-3000) and so on using the id column to keep track.
How can I specify this in my query? Please show examples if you know the answer. Thank you.
here are my indexes if it helps:
mysql> show index from main.pool;
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| pool | 0 | PRIMARY | 1 | id | A | 9275039 | NULL | NULL | | BTREE | |
| pool | 1 | username | 1 | username | A | 9275039 | NULL | NULL | | BTREE | |
| pool | 1 | source | 1 | source | A | 1 | NULL | NULL | | BTREE | |
| pool | 1 | location | 1 | location | A | 38168 | NULL | NULL | | BTREE | |
| pool | 1 | pdex | 1 | gender | A | 2 | NULL | NULL | | BTREE | |
| pool | 1 | pdex | 2 | username | A | 9275039 | NULL | NULL | | BTREE | |
| pool | 1 | pdex | 3 | id | A | 9275039 | NULL | NULL | | BTREE | |
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
8 rows in set (0.00 sec)
mysql> show index from main.sent;
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| sent | 0 | PRIMARY | 1 | primary_key | A | 351 | NULL | NULL | | BTREE | |
| sent | 1 | username | 1 | username | A | 175 | NULL | NULL | | BTREE | |
| sent | 1 | sdex | 1 | campid | A | 7 | NULL | NULL | | BTREE | |
| sent | 1 | sdex | 2 | username | A | 351 | NULL | NULL | | BTREE | |
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
and here is the explain for my query:
----------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------+---------------+------+---------+-------+---------+--------------------------------------+
| 1 | SIMPLE | pool | ref | location,pdex | pdex | 5 | const | 6084332 | Using where |
| 1 | SIMPLE | sent | index | sdex | sdex | 309 | NULL | 351 | Using where; Using index; Not exists |
+----+-------------+-------+-------+---------------+------+---------+-------+---------+--------------------------------------+
here is the structure of the pool table:
| pool | CREATE TABLE `pool` (
`id` int(20) NOT NULL AUTO_INCREMENT,
`username` varchar(50) CHARACTER SET utf8 NOT NULL,
`source` varchar(10) CHARACTER SET utf8 NOT NULL,
`gender` varchar(1) CHARACTER SET utf8 NOT NULL,
`location` varchar(50) CHARACTER SET utf8 NOT NULL,
PRIMARY KEY (`id`),
KEY `username` (`username`),
KEY `source` (`source`),
KEY `location` (`location`),
KEY `pdex` (`gender`,`username`,`id`)
) ENGINE=MyISAM AUTO_INCREMENT=9327026 DEFAULT CHARSET=latin1 |
here is the structure of the sent table:
| sent | CREATE TABLE `sent` (
`primary_key` int(50) NOT NULL AUTO_INCREMENT,
`username` varchar(50) NOT NULL,
`from` varchar(50) NOT NULL,
`campid` varchar(255) NOT NULL,
`timestamp` int(20) NOT NULL,
PRIMARY KEY (`primary_key`),
KEY `username` (`username`),
KEY `sdex` (`campid`,`username`)
) ENGINE=MyISAM AUTO_INCREMENT=352 DEFAULT CHARSET=latin1 |
This produces a syntax error but this WHERE clause in the beginning is what im after:
SELECT pool.username
FROM pool
WHERE id < 1000
LEFT JOIN sent ON pool.username = sent.username
AND sent.campid = 'YA1LGfh9'
WHERE sent.username IS NULL
AND pool.gender = 'f'
AND (location = 'united states' OR location = 'us' OR location = 'usa');
Splitting your query does not sound like the right approach.
The better way would be to fetch some records from your existing query, send your messages and then continue fetching.
Your query could benefit from another compound index on
This should allow to run your complete query from
sdexand your new index.If you really want to split the query, an easy approach could be to
and then loop from min to max in steps of
1000, and addid >= r AND id < r+1000to your query.This could return
0rows if you have gaps, but it would never return more than 1000 rows at once. A different compound index onpoolincluding (id,location,genderand maybeusername) could help for this query.