Let say I have 5 documents as rows and 2 columns ‘document’ and ‘description’ in my mySQL table.
- Document 1: John and Nancy are best friends.
- Document 2: John, Casey, David, Nancy are best friends.
- Document 3: Nancy and Casey are best friends.
- Document 4: David is in relationship with Casey. David and Casey are madly in love.
- Document 5: David and John are siblings.
So if the search query is “David Casey”, how to calculate the query based on terms frequency in all the 5 documents and rank the result based on the frequency.
In this case, the result should be like this:
- Document 4 (because of there are 2 ‘David’ and 2 ‘Casey’)
- Document 2 (1 ‘David’ and 1 ‘Casey’)
- Document 3 (1 ‘Casey’)
- Document 5 (1 ‘David’)
I’ve read many tf-idf articles but none of them can help me. I don’t have the idea on how to write the codes.
This is my current code:
$searchCondition = “description LIKE ‘%” . implode(“%’ OR description LIKE ‘%”, $searchTerms) . “%'”;
$query = “SELECT description FROM table1 WHERE $searchCondition ORDER BY description ASC”;
$result = mysqli_query($dbc,$query);
…
…
…
This works for sure:
The items from the database are sorted then in descending. So the most relevant gets the first place.
note: This works fine only when the number of key-words is small. Because of checking 3 times for length for each keyword. So the responsetime on bigger tables and more keywords wight be a bit different 😉