I have 2 databases in excel. In database A, I have the names of various companies, cities, and charities. Database B is the same. However Database B is filled out by the customer. As such, I get a lot of random mistakes and/or variations on the legal name.
What is the best way to match the names?
Here are some sample differences:
- City of ABC might show up as Corporation of the City of ABC
- ABC Corporation might just up as ABCcorporation (They forgot a space)
- University of ABC may be abbreviated as Univ of ABC
- Canadian Tire might show up as Canadian Tire Store #503
- Canadian Tire might be spelt wrong like Canadia Tire
- ABC Corp might show up as ABC Inc
Is there a good solution to this? I know this question is a bit of a long shot, but if I can do this I will have saved people in my company like thousands of hours each year…
Any advice will be greatly appreciated
This is a very complex problem. Look up “master data management” and “dedup”. This wikipedia article is a good starting point.
The problem is best solved in small chunks. My recommendation is to read up a little and implement a tool that lists potential duplicates and some easy way to merge them. The keyword here is potential; you don’t want to do wrong merges and false positives are very likely and very harmful.