I have large table (~1,000,000 rows) that may contain duplicates values.
the table contain two columns (for example col a, col b) that together represent unique key, ID and last update date.
for example I can have table like:
id | a | b | update
1 | jon | smith | 1/1
2 | don | smith | 2/5
3 | bob | david | 1/1
4 | dan | lewis | 3/1
5 | bob | david | 3/1
As you can see for id 3 and 5 the table contain the same values in both a and b columns.
I would like to delete the rows that contain this kind of duplication , but keep the last updated row.
For this example I will have this table after deletion:
id | a | b | update
1 | jon | smith | 1/1
2 | don | smith | 2/5
4 | dan | lewis | 3/1
5 | bob | davis | 3/1
(id = 3 deleted ,since I already have a=bob and b=davis in row where id=5 and the update in this row is higher then the one in the deleted row)
1 Answer