I am trying to write some SQL to automatically delete some records from the database but I’m having some trouble with the logic and want to see if someone can help me with this query.
Basically I have a table called image. In image there is a column for a main ID, a secondary ID, a type, a year, and a date.
A main ID may have multiple secondary IDs so that means multiple rows of the same main ID with different secondary IDs. Each secondary ID will for sure have two rows, a type of small and a type of large. Some of the secondary IDs have duplicate data so there may be 6 rows (picked an arbitrary number), which means one of each type small and large repeated 3 times for a single secondary ID for a single main ID. All of these can be for a single year and then repeated for the next year.
That is hard to grasp so let me create a visual example of some data. I’m not good at formatting so the data row order will be like I typed it above, main ID, secondary ID, type, a year, and date.
EX:
1000 3000 Small 2010 2010-11-28
1000 3000 Large 2010 2010-11-28
1000 3000 Small 2010 2010-11-29
1000 3000 Large 2010 2010-11-29
1000 3000 Small 2011 2010-11-30
1000 3000 Large 2011 2010-11-30
1000 3001 Small 2010 2010-11-28
1000 3001 Large 2010 2010-11-28
1000 3001 Small 2010 2010-11-28
1000 3001 Large 2010 2010-11-28
1000 3001 Small 2011 2010-11-28
1000 3001 Large 2011 2010-11-28
You can see that there may be duplicate data for a single secondary ID with the same date and year. There is also data that could be duplicate based on the same secondary ID and year even though the date is a day off or just different.
Basically what I need is a query that will go through a table filled with data like this and delete the bad records that are not needed.
I want only two records, a small and large, for each secondary ID, for each year (not date), for each main ID in this image table, keeping the newest by date.
So for example I would expect these records to be left after running this script:
1000 3000 Small 2010 2010-11-29
1000 3000 Large 2010 2010-11-29
1000 3000 Small 2011 2010-11-30
1000 3000 Large 2011 2010-11-30
1000 3001 Small 2010 2010-11-28
1000 3001 Large 2010 2010-11-28
1000 3001 Small 2011 2010-11-28
1000 3001 Large 2011 2010-11-28
Again this is only an example for a single main ID and a couple of secondary IDs, there could be x number of main IDs in this table.
How can I go about writing a query that will delete the extra rows in this image table as defined by my example?
1 Answer