I have a column that is being used to list competitors names in a table I’m putting together. Right now don’t have a lot of control over how these inputs are made, and it causes some serious headaches. There are random spaces and misspellings throughout our data, and yet we need to list the data by competitor.
As an example (not actual SQL I’m using), list of competitors:
Price Cutter
PriceCutter
PriceCuter
Price Cuter
If I ran the query:
SELECT Competitor_Name, SUM(Their_Sales),
FROM Cmdata.Competitors
Where Their_Sales
Between 10000 AND 100000000
Group by Competitor_Name
I would get a different entry for each version of Price Cutter, something I clearly want to avoid.
I would think this problem would come up a lot, but I did a Google search and came up dry. I will admit, the question is kind of hard to articulate in a few words, maybe that’s why I didn’t come with anything. Either that or this is so basic I should already know…
(PS- Yes, we’re moving to a drop down menu, but it’s gonna take some time. In the mean time, is there a solution?)
You need to add a
Competitortable, that has a standard name for each competitor.Then, use foreign key references in other tables.
The problem that you are facing is a data cleansing and data modeling issue. It is not particularly hard to solve, but it does require a fair amount of work. You can get started by getting a list of all the current spellings and standardize them — probably in an Excel spreadsheet.
If you do that, you can then create a lookup table and change the values by looking them up.
However, in the medium term, you should be creating a
Competitortable and modelling the data in the way that your application needs.