Table Schema
For the two tables, the CREATE queries are given below:
Table1: (file_path_key, dir_path_key)
create table Table1(file_path_key varchar(500), dir_path_key
varchar(500), primary key(file_path_key)) engine = innodb;
Example, file_path_key = /home/playstation/a.txt
dir_path_key = /home/playstation/
Table2: (file_path_key, hash_key)
create table Table2(file_path_key varchar(500) not null, hash_key
bigint(20) not null, foreign key (file_path_key) references
Table1(file_path_key) on update cascade on delete cascade)
engine = innodb;
Objective:
Given a hash value *H* and a directory string *D*, I need to find all those
hashes which equal to *H* from Table2, such that, the corresponding file entry
doesn't have *D* as it's directory.
In this particular case, Table1 has around 40,000 entries and Table2 has 5,000,000 entries, which makes my current query really slow.
select distinct s1.file_path_key from Table1 as s1 join (select * from Table2 where hash_key = H) as s2 on s1.file_path_key = s2.file_path_key and s1.dir_path_key !=D;
The sub-select is really slowing your query down unnecessarily.
You should remove that and replace it with a simple join, moving pushing all of the non-join related criteria down into the WHERE clause.
Also you should add indexes on the Table1.dir_path_key and Table2.hash_key columns:
Try something like this for the query: