I have a table Stores, and a table Schools. This is a one to many relationship– multiple schools can be served by the same store, but not vice-versa.
Earlier on in development, I made the mistake of repeating the same store multiple times in the Stores database. I inserted rows like:
Store_ID| Store_URL
1 | http://sameurl.com
2 | http://sameurl.com
And then if two different schools were at that same store, I’d be referencing 1 in one school row, and 2 in another.
I’m able to identify duplicates quite easily by using GROUP BY on Store_URL and using COUNT() to identify duplicates.
The difficult task ahead of me is making all the Schools point to non-duplicate Stores. If I simply delete duplicate Stores, I’ll have Schools which point to nonexistent rows.
What can I do to eliminate duplicates and make schools that share the same store point to the same Store row?
Note: there are thousands of schools and stores. Manual solutions don’t work.
Assuming your
Schooltable has astore_IDfrom what you’ve said.I would start by figuring out for each duplicate, which
store_IDyou want to keep. I will also assume that you want it to be the lowest ID value. I would then update theSchools’store_IDto be theMIN(store_ID)for the current URL they have. You should then be free to delete the extrastore_IDrecordsThis is how I would go about the update:
If you are able to delete stores that do not have an associated school, the following query will remove the extra rows:
If you only want to delete the Store’s duplicate records, I would look at this query instead of the above: