I already have 80 million records inserted into a table, but need to ensure a few columns are jointly unique. However, the columns already contain non-unique data, so ALTER TABLE doesn’t work.
I’d like either a query that will let me easily delete records that are non-unique, while keeping one of them, or one that will allow me to load the data from the current table into a new one, while filtering for uniqueness.
The query you’re looking for is:
This selects one row for each combination of columns within
distinct on. Actually, it’s always the first row. It’s rarely used withoutorder bysince there is no reliable order in which the rows are returned (and so which is the first one).Combined with
order byyou can choose which rows are the first (this leaves rows with the greatest last_update_date):Now you can select this into a new table:
Or you can use it for delete, assuming
row_idis a primary key: