I am looking for a simple way (UDF?) to establish the similarity between strings. The SOUNDEX and DIFFERENCE function do not seem to do the job.
Similarity should be based on number of characters in common (order matters).
For example:
Spiruroidea sp. AM-2008
and
Spiruroidea gen. sp. AM-2008
should be recognised as similar.
Any pointers would be very much appreciated.
Thanks.
Christian
You may want to consider implementing the Levenshtein Distance algorithm as a UDF, so that it will return the number of operations that need to be performed on String A in order for it to become String B. This is often referred to as the edit distance.
You can then compare the result of the Levenshtein Distance function against a fixed threshold, or against a percentage length of String A or String B.
You would simply use it as follows:
You may want to check out the following Levenshtein Distance implementation for SQL Server: