I have to clean up records from a table that doesn’t have a primary key or a a unique constraint.
Table definition:
create table person(
name text,
staff_id integer,
work_code text,
location
);
Unsurprisingly, it contains a lot of duplicates and partial duplicates.
What is the best way to transform the records to a unique set. I don’t have to care about other columns besides name and staff_id
As you
This could be your procedure to clean up the table:
1.) Create a temporary table of unique rows:
I arbitrarily pick the “first row per
(name, staff_id)– minimumwork_codeand matchinglocation.2.) Empty table:
3.) Re-INSERT unique tuples:
Make sure, dupes don’t creep back in. Add a surrogate primary key:
Or just add a multi-column primary key:
The temporary table will be dropped at the end of the session automatically.
Of course, all of this is best done inside one transaction, so you don’t lose anything in the unlikely case that you run into a problem half way. Some clients do that automatically for a batch of SQL statements executed at once.