consider I have a
string1 = "hello hi goodmorning evening [...]"
and I have some minor keywords
compare1 = "hello evening"
compare2 = "hello hi"
I need a function that returns the affinity between the text and keywords. Example:
function(string1,compare1); // returns: 4
function(string1,compare2); // returns: 5 (more relevant)
Please note 5 and 4 are just for example.
You could say – write a function that counts occurrences – but for this example this would not work because both got 2 occurrences, but compare1 is less relevant because “hello evening” isn’t exactly found in string1 (the 2 words hello and evening are more distant than hello hi)
are there any known-algorithm to do this?
ADD1:
algos like Edit Distance in this case would NOT work.
Because string1 is a complete text (like 300-400 words) and the comparing strings are max 4-5 word.
A Dynamic Programing Algorithm
It seems what you are looking for is very similar to what the Smith–Waterman algorithm does.
From Wikipedia:
Let’s see a practical example, so you can evaluate its usefulness.
Suppose we have a text:
I isolated the segment we are going to match, just for your easy of reading.
We will compare the affinity (or similarity) with a list of strings:
I have the algorithm already implemented, so I’ll calculate the similarity and normalize the results:
Then we Plot the results:
I think it’s very similar to your expected result.
HTH!
Some implementations (w/source code)
(GSW)
(presentation)
applet