The scenario is simple. I have 4 tables, A -table, B -table, C1 -table and C2 -table. A is a root level table, B references A, and C1 and C2 reference B. But each B.ID can only be referenced by either C1 or C2, never both. The results are exported to a .CSV -file which is then used for a variety of purposes, and the question here has to do with readability as well as making it easier to manage the information in external software.
I wrote a query that returns all data in all 4 tables keeping the relations intact, ordering them by A, B, C1 and C2.
SELECT A.*, B.*, C1.*, C2.*
FROM A
JOIN B
LEFT JOIN C1
LEFT JOIN C2
ORDER BY A.ID, B.ID, etc.
And got this:
A.ID | B.ID | C1.ID | C2.ID
1| 1| 1| NULL
1| 1| 2| NULL
1| 2| 1| NULL
1| 2| 2| NULL
1| 2| 3| NULL
2| 1| NULL| 1
2| 1| NULL| 2
....
Now, the question here is this: How do I return only the first distinct row for each join, so that the resultset doesn’t get clogged with redundant data. Basically, the result above should produce this:
A.ID | B.ID | C1.ID | C2.ID
1| 1| 1| NULL
| | 2| NULL
| 2| 1| NULL
| | 2| NULL
| | 3| NULL
2| 1| NULL| 1
| | NULL| 2
....
I can probably do this by making each join a subquery and partitioning the results by rank, or alternatively creating a temporary table and slam the results there with the required logic, but since this will be used in a console app, I’d like to keep the solution as clean, simple and optimized as possible.
Any ideas?
This is reporting / formatting, not data, so it should be handled by the application, not by SQL.
That said, this will produce something close to your requirements