I have a legacy DB with:
firstname, lastname, address1, address2, address3, address4, zipcode
The data is scattered between the different columns with no consistency eg the actual zipcode could be in any column and there are plenty of typos.
Is there a way I could use something like SOUNDEX / DIFFERENCE in a SP to loop through everything and return an ordered list of likely duplicates?
[it doesn’t need to be fast]
If you are using SQl server 2005 or above, you can use fuzzy matching in SSIS to do this task. I found that I got significantly better results in doing this than in looking for soundex matches or writng my own sql scode to look for near matches.