I know how to compare two texts and get all the single words that appear in both. But how can I match expressions/phrases?
For example:
1. “This is the computer maker Apple”
2. “Apple is a California based great computer maker”
Now 🙂
-
Apple is clearly present in both.
-
computer and maker are present in both. I could check at this point if they are a group of words(one follows the other one).
But for the speed of processing, isn’t there a way to match “computer maker” and not each one and then check if present as a group.
Keep in mind that the example given is trivial and just for the purpose of exemplifying, in practice more complicated sentences/texts may be presented.
You could parse both strings and split on whitespace to get token arrays A1 and A2. Then, simply check every contiguous subsequence in A1 for a matching one in A2. This looks like O(n^4) to me, which is better than getting all the single matches and looking for combinations… which is not polynomial.
Recursion seems like an elegant way to implement something like this. If you need something more efficient, I’m sure that there is a smarter way to do it than this.