I’m trying to make mini search engine for a site containing products. I’ve already considered fulltext search, the LIKE clause, etc. but I still want to proceed my way because the database is going to be ridiculously huge (hundreds of millions of products).
The design goes something like this – I have a table pairing words to word IDs. I have another table containing all pairs of word IDs to the product IDs for which the product matches. When a user searches for, say, “2gb memory card”, the script parses “2gb” “memory” and “card”.
Then I use:
SELECT pid
FROM indx_0
WHERE wid = 294 OR wid = 20591 OR wid = 330
I end up with pairs of words matching products.
I have a PHP algorithm to decide which products go to the top depending on multiple things. but when i load 380k results into a php array the execution time becomes ridiculously slow. so clearly, i can’t do that. but if i limit to say, 1000 results per word, the execution is fast – but it doesn’t include all the possible results.
in the “indx_0” table each “pid” (product id) is unique to a “wid” (word id).. and clearly, some products are going to have more than 1 match. i want to retrieve those “pid”s who have the most matches against “wid”s.
Say there are 2000 products matching “2gb” and 200,000 matching “card” and 50,000 matching “memory” but only 20 products that match ALL 3 of those words, and 200 products matching a combination of 2 of those words.
Is it possible to retrieve those 20 products as well as the 200 products that partially match?
What you probably need to do is group by the product ID and get a count that match. Then have the order by the most counts hit descending… i.e.: one product matches all 3 wIDs and other just matches 1, the 3 count would be first in the list