I have a table with the format:
Id | Loc |
-------|-----|
789-A | 4 |
123 | 1 |
123-BZ | 1 |
123-CG | 2 |
456 | 2 |
456 | 3 |
789 | 4 |
I want to exclude certain rows from the result of query based on whether a duplicate exists. In this case, though, the definition of a duplicate row is pretty complex:
If any row returned by the query (let’s refer to this hypothetical row as ThisRow) has a counterpart row also contained within the query results where Loc is identical to ThisRow.Loc AND Id is of the form <ThisRow.Id>-<an alphanumeric suffix> then ThisRow should be considered a duplicate and excluded from the query results.
For example, using the table above, SELECT * FROM table should return the results set below:
Id | Loc |
-------|-----|
789-A | 4 |
123-BZ | 1 |
123-CG | 2 |
456 | 2 |
456 | 3 |
I understand how to write the string matching conditional:
ThisRow.Id REGEXP '^PossibleDuplicateRow.Id-[A-Za-z0-9]*'
and the straight comparison of Loc:
ThisRow.Loc = PossibleDuplicateRow.Loc
What I don’t understand is how to form these conditionals into a (self-referential?) query.
Here’s one way:
SQL Fiddle example
Or, the same query using an anti-join (which should be a little faster):
SQL Fiddle example