I’ve the following database structure:
id idproperty idgbs
1 1 136
2 1 128
3 1 10
4 1 1
5 2 136
6 2 128
7 2 10
8 2 1
9 3 561
10 3 560
11 3 10
12 3 1
13 4 561
14 4 560
15 4 10
16 4 1
17 5 234
18 5 120
19 5 1
20 6 234
21 6 120
22 6 1
Here are the details:
The table refers idproperty with different geographic location. For example:
idgbs
1 refers to United States
10 refers to Alabama with parentid 1 (United States)
128 refers to Alabama Gulf Coast with parentid 10 (Alabama)
136 Dauphin Island with parentid 128 (Alabama Gulf Coast)
So, the structure is:
United States > Alabama > Alabama Gulf Coast > Dauphin Island
I want to delete all entries for idproperty EXCEPT the first with the set of idgbs 136, 128, 10, 1 i.e. leave atleast 1 property in all GBS and delete others.
Also, sometimes it is 4 level of geographic entries, sometimes it is 3 level.
Please share the logic & SQL query to delete all entries except one in every unique GBS.
GBS 1, 10, 128, 136 is one unique, so database should only contain 1 property id with these GBS.
After the query, the table would look like this:
id idproperty idgbs
1 1 136
2 1 128
3 1 10
4 1 1
9 3 561
10 3 560
11 3 10
12 3 1
17 5 234
18 5 120
19 5 1
Rephrasing the question:
I want to keep properties in every root level GBS i.e. there should be only ONE property in Dauphin Island.
Whew… I think I understand what you are after now. I couldn’t let this one go 😉
I had to realize that in the question, you wanted property 2 deleted, because it shared a hierarchy with property 1. Once I realized that, I got the following idea. Basically, we join to an aggregated version of self twice: the first one tells us what our "gbs hierarchy path" is, and the second one matches any previous properties with the same hierarchy. Rows which find that there are no "previous" properties that share their hierarchy are spared, the rest with that hierarchy are deleted. It’s possible that this could be further tweaked, but I wanted to share this now. I have tested it with the data you showed, and I got the results you posted.
Note that the last line is not a typo, we are just checking if there is a previous property with the same path. If there is, then delete the currently-evaluated row.
Cheers!
note for clarification
The thought here is to associate every row with it’s hierarchy, even if it’s a row which represents somewhere in the middle of the hierarchy (such as row: {2, 1, 128} in the question). With the first join to the aggregate, each row now "knows" what it’s path is (so that row would get "136/128/10/1"). We can then use that value in the second join to find other properties with the same path, but only if they have a LOWER property id. This allows us to check for the existence of a lower-ID property with the same "path", and delete any row which represents a property which does have such a "lower-order path-sibling".