I need some help with this issue: As an input I have a string,

Question

0

Asked: June 13, 20262026-06-13T10:00:21+00:00 2026-06-13T10:00:21+00:00

I need some help with this issue: As an input I have a string,

0

I need some help with this issue:

As an input I have a string, which looks like Blue cat green eyes 2342342, or it can be Cat blue eyes green 23242 or any other permutation of words.

In my DB table I have some data. One of the columns is called, say, keyWords.

Here is an example of this table:

enter image description here

My task is to find record in my DB table column, KEYWORDS, which matches some words from the input string.

For example: for strings “Blue cat green eyes 2342342″ “Cat blue eyes green 23242″ and “Cat 23242 eyes blue green” the result must be “blue cat” (first row of my table).
The only way I can imagine how to solve this task looks like this:

Consistently take every word from the string.
Search this every word with %like% in a table column.
If it is not found it means this word is not key and we have no interest in it.
If it is found one time – great! No doubt, this is what we are looking for.
If there are more than one result:
From all the words from the string, which were not tested yet consistently take every word.
Search this word with %like% in the results from step 2.
etc…

Graphical schema of this algorithm is here

But it looks like this algorithm will work very slowly if there are a lot of records in a table and if my input string consists of big number of words.

So, my question is: Is there are any special algorithms which can help solving this task?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-13T10:00:22+00:00

You can adopt another table such as

ID    KeywordID     Word
1     1             blue
2     2             blue
3     1             cat

and transform the string

"Blue cat green eyes 2342342"

in a series of indexes and counts:

SELECT KeywordID, COUNT(*) FROM ancillary WHERE Word IN ('blue','cat','green','eyes'...)

This would perform a series of exact matches and return, say,

KeywordID   Count
1           2
2           1

Then you know that keyword group with id 1 has two words, which means that a count of 2 matches all of them. So keywordid 1 is satisfied. Group 2 has also two words (black, cat) but only one was found, and the match is there but not complete.

If you also record the keyword set size together with keyword ID, then all keywords from the same ID will have the same KeywordSize, and you can GROUP BY it too:

KeywordID   KeywordSize    Count
1           2              2
2           2              1

and can even SELECT COUNT(*)/KeywordSize AS match ... ORDER BY match and have keyword matches sorted by relevancy.

Of course, once you have KeywordID, you can find it in the keywords table.

Implementation

You want to add the keyword list “black angry cat” to your existing table.

So you explode this keyword list into words: and get “black”, “angry” and “cat”.

You insert the keyword list normally in the table that you already have, and retrieve the ID for that newly created row, let’s say it is 1701.

Now you insert the words into a new table that we call “ancillary”. This table only contains the keyword row ID of your primary table, the single word, and the size of the word list from which that word comes.

We know we are inserting 3 words in all, for table row 1701, so size=3 and we insert these tuples:

(1701, 3, 'black')
(1701, 3, 'cat')
(1701, 3, 'angry')

(These will receive an unique ID of their own, but this does not concern us).

Now some time later we receive a sentence which is,

'Schroedinger cat is black and angry'

We could first run the query against a list of null-words to be removed, such as “is” and “and”. But this is not necessary.

Then we could run as many queries as there are words, and thereby discover that no rows anywhere contained “Schroedinger” and we can drop it. But this, too, is not necessary.

Finally we build the real query against ancillary:

SELECT KeywordID, COUNT(*) AS total, ListSize*100/COUNT(*) AS match
    FROM ancillary WHERE Word IN ('Schroedinger','cat','is','black','and','angry')
    GROUP BY KeywordID;

The WHERE will return, say, these rows:

(1234, 'black') -- from 'black cat'
(1234, 'cat')   -- from 'black cat'
(1423, 'angry') -- from 'angry birds'
(1701, 'cat')   -- from 'black angry cat'
(1701, 'angry') -- from 'black angry cat'
(1701, 'black') -- from 'black angry cat'
(1999, 'cat')   -- from 'nice white cat'

So the GROUP will return the KeywordID of these rows with its cardinality:

1423   1   50%
1701   3  100%
1234   2  100%
1999   1   33%

Now you can sort by matching ratio descending, and then by list size descending (since matching 100% of 3 words is better than matching 100% of 2, and matching 1 in 2 is better than matching 2 in 3):

1701   3  100% -- our best match
1234   2  100% -- second runner
1423   1   50%
1999   1   33%

You can also retrieve your first table in one query, with added match ratio:

SELECT mytable.*, total, match FROM
mytable JOIN (
SELECT KeywordID, COUNT(*) AS total, ListSize*100/COUNT(*) AS match
    FROM ancillary WHERE Word IN ('Schroedinger','cat','is','black','and','angry')
    GROUP BY KeywordID
) AS ancil ON (mytable.KeywordID = ancil.KeywordID)
ORDER BY match DESC, total DESC;

The largest cost is for the exact match in “ancillary” which has to be indexed on the Word column.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I need some help with this issue: As an input I have a string,

Leave an answerCancel reply

1 Answer

Implementation

Leave an answer
Cancel reply