Table Schema
For the two tables, the CREATE queries are given below:
Table1: (file_path_key, dir_path_key)
create table Table1(
file_path_key varchar(500),
dir_path_key varchar(500),
primary key(file_path_key))
engine = innodb;
Table2: (file_path_key, hash_key)
create table Table2(
file_path_key varchar(500) not null,
hash_key bigint(20) not null,
foreign key (file_path_key) references Table1(file_path_key) on update cascade on delete cascade)
engine = innodb;
Objective:
Given a file_path F and it’s dir_path string D, I need to find all those
file names which have at least one hash in the set of hashes of F, but
don’t have their directory names as D. If a file F1 shares multiple hashes
with F, then it should be repeated that many times.
Note that the file_path_key column in Table1 and the hash_key column in Table2 are indexed.
In this particular case, Table1 has around 350,000 entries and Table2 has 31,167,119 entries, which makes my current query slow:
create table temp
as select hash_key from Table2
where file_path_key = F;
select s1.file_path_key
from Table1 as s1
join Table2 as s2
on s1.file_path_key join
temp on temp.hash_key = s2.hash_key
where s1.dir_path_key != D
How can I speed up this query?
I do not understand what is the purpose of
temptable, but remember that such table, created with CREATE .. SELECT, does not have any indexes. So at the very least fix that statement toOtherwise the other SELECT performs full join with
temp, so it might be very slow.I would also suggest using a numerical primary key (INT, BIGINT) in Table1 and reference it from Table2 rather than the text column. Eg:
Queries joining the two tables may be a lot faster if integer columns are used in join predicate rather than text ones.