I am building a forward index on a wiki using MySQL. I am running into performance problems with queries and I am hoping for some help optimising either my schema or my queries
The database is around 1GB and it has three tables
- fi_page is the table of 800k wiki pages
-
fi_keyword is a table of 70k keywords
CREATE TABLE `fi_keyword` ( `id` int(11) NOT NULL AUTO_INCREMENT, `keyword` varchar(100) NOT NULL, PRIMARY KEY (`id`), UNIQUE KEY `keyword` (`keyword`) ); -
fi_titlekeywordlink is a table with 6 million entries linking keywords to wiki pages
CREATE TABLE `fi_titlekeywordlink` ( `id` int(11) NOT NULL AUTO_INCREMENT, `keyword_id` int(11) NOT NULL, `page_id` int(11) NOT NULL, PRIMARY KEY (`id`), KEY `fi_titlekeywordlink_a6434082` (`keyword_id`), KEY `fi_titlekeywordlink_c2d3d2bb` (`page_id`), CONSTRAINT `keyword_id_refs_id_67197756` FOREIGN KEY (`keyword_id`) REFERENCES `fi_keyword` (`id`), CONSTRAINT `paper_id_refs_id_705ddf03` FOREIGN KEY (`page_id`) REFERENCES `fi_page` (`id`) );
I am translating up a search for ‘search terms galore’ into an sql query such as
select p.*
from
fi_keyword as k0, fi_titlekeywordlink as l0,
fi_keyword as k1, fi_titlekeywordlink as l1,
fi_keyword as k2, fi_titlekeywordlink as l2,
fi_keyword as k3, fi_titlekeywordlink as l3,
fi_page as p
where
k0.keyword = e and k0.id = l0.keyword_id and p.id = l0.paper_id
and k1.keyword = 'search' and k1.id = l1.keyword_id and p.id = l1.paper_id
and k2.keyword = 'terms' and k2.id = l2.keyword_id and p.id = l2.paper_id
and k3.keyword = 'galore' and k3.id = l3.keyword_id and p.id = l3.paper_id
limit 1,10
however this is taking around half a second to run on my MBP. Do you have any suggestions on how to speed up this sort of operation either by changing the schema or the query? I cannot use a separate search server in this case, the forward index must run on MySQL. Thank you.
At the cost of insertion performance, you could delete the surrogate
idprimary key columns from both tables and make your primary key index on thekeywordcolumn for fi_keyword and (keyword_id,page_id) as the primary key index for fi_titlekeywordlink.If you are using InnoDB, primary keys are clustered indexes, so they are much faster.
Even if you don’t make this change, a compound (multi-column) index of (
keyword_id,page_id) on fi_titlekeywordlink would improve performance because you would have a covering index (MySQL wouldn’t have to visit the table data) on fi_titlekeywordlink. This assumes that your MySQL server has enough RAM to fit all indexes in memory and that you’ve configured MySQL server to allow it to use enough RAM to make it so (configuration variables differ between MyISAM and InnoDB).Sometimes, an implicit JOIN can get too complex for MySQL to properly optimize. You should also consider rewriting the query with explicit ANSI standard joins using
JOINandON.You probably just wrote
SELECT p.*for brevity, but be sure to only select the columns that you require so that you’re not returning unneeded data. Only returning the columns that you need reduces the work load.Also, the first row in a LIMIT clause is 0, so
LIMIT 1, 10skips the first row. UseLIMIT 0, 10to get the first 10 rows.