I’m trying to find some sort of a good, fuzzy string matching algorithm. Direct matching doesn’t work for me — this isn’t too good because unless my strings are a 100% similar, the match fails. The Levenshtein method doesn’t work too well for strings as it works on a character level. I was looking for something along the lines of word level matching e.g.
String A: The quick brown fox.
String B: The quick brown fox jumped
over the lazy dog.These should match as all words in
string A are in string B.
Now, this is an oversimplified example but would anyone know a good, fuzzy string matching algorithm that works on a word level.
I like Drew’s answer.
You can use difflib to find the longest match:
Or pick some minimum matching threshold. Example: