Let say I have 5 documents as rows and 2 columns ‘document’ and ‘description’

Question

0

Editorial Team

Asked: May 27, 20262026-05-27T18:08:23+00:00 2026-05-27T18:08:23+00:00

Let say I have 5 documents as rows and 2 columns ‘document’ and ‘description’

0

Let say I have 5 documents as rows and 2 columns ‘document’ and ‘description’ in my mySQL table.

Document 1: John and Nancy are best friends.
Document 2: John, Casey, David, Nancy are best friends.
Document 3: Nancy and Casey are best friends.
Document 4: David is in relationship with Casey. David and Casey are madly in love.
Document 5: David and John are siblings.

So if the search query is “David Casey”, how to calculate the query based on terms frequency in all the 5 documents and rank the result based on the frequency.

In this case, the result should be like this:

Document 4 (because of there are 2 ‘David’ and 2 ‘Casey’)
Document 2 (1 ‘David’ and 1 ‘Casey’)
Document 3 (1 ‘Casey’)
Document 5 (1 ‘David’)

I’ve read many tf-idf articles but none of them can help me. I don’t have the idea on how to write the codes.

This is my current code:

$searchCondition = “description LIKE ‘%” . implode(“%’ OR description LIKE ‘%”, $searchTerms) . “%'”;

$query = “SELECT description FROM table1 WHERE $searchCondition ORDER BY description ASC”;

$result = mysqli_query($dbc,$query);

…

…

…

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-27T18:08:23+00:00

This works for sure:

$searchCondition = "description LIKE '%" . implode("%' OR description LIKE '%", $searchTerms) . "%'";
$orderCondition = array();
foreach ($searchTerms as $word) {
    $orderCondition[] = "(length(description)-length(replace(description,\"".$word."\",\"\")))/length(\"".$word."\")";
}
$orderConditionString = "(".implode(" + ", $orderCondition).")";

$query = "SELECT description FROM table1 WHERE $searchCondition ORDER BY $orderConditionString DESC";

The items from the database are sorted then in descending. So the most relevant gets the first place.

note: This works fine only when the number of key-words is small. Because of checking 3 times for length for each keyword. So the responsetime on bigger tables and more keywords wight be a bit different 😉

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

Let say I have 5 documents as rows and 2 columns ‘document’ and ‘description’

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply