I’m up against a mssql database, having a SQL query like…
SELECT id, type, start, stop, one, two, three, four
FROM a
UNION ALL
SELECT id, type, start, stop, one, two, three, four
FROM b
UNION ALL
SELECT id, type, start, stop, one, two, three, four
FROM c
ORDER BY type ASC
Resulting in…
row | id type start stop one two three four
----+--------------------------------------------------------------
1 | 1 a 2010-01-01 2010-01-31 100 1000 1000 100
2 | 1 a 2010-02-01 2010-12-31 100 500 500 50
3 | 1 b 2010-01-01 2010-01-31 100 NULL NULL 100
4 | 1 b 2010-01-01 2010-12-31 100 NULL NULL 100
5 | 1 c 2010-01-01 2010-01-31 0 NULL NULL 100
6 | 1 c 2010-01-01 2010-12-31 0 NULL NULL 100
However, I would much rather prefer the following outcome…
row | id type start stop one two three four
----+--------------------------------------------------------------
1 | 1 a 2010-01-01 2010-01-31 100 1000 1000 100
2 | 1 a 2010-02-01 2010-12-31 100 500 500 50
4 | 1 b 2010-01-01 2010-12-31 100 NULL NULL 100
6 | 1 c 2010-01-01 2010-12-31 0 NULL NULL 100
That is, eliminating row 3 and 5, since they are dupes to row 4 and 6 in every way but the stop-column, AND whereas the unfortunate row having to lowest value in the excluding stop-column is to be removed.
How can I accomplish this?
I’ve been thinking something like…
SELECT * FROM (
SELECT id, type, start, stop, one, two, three, four
FROM a
UNION ALL
SELECT id, type, start, stop, one, two, three, four
FROM b
UNION ALL
SELECT id, type, start, stop, one, two, three, four
FROM c
ORDER BY type ASC
) AS types
GROUP BY ... HAVING ???
I need guidance, please help.
(And no, I’m in no position to change any conditions, I’ve got to work with the given situation.)
Similar questions have been asked and answered. For example: Select uniques, and one of the doubles
And your situation is even simpler (if I understood your problem description correctly):
In place of
(...)you put your selects from a, b and c.Just leave out
order byclause.Or, if instead of (id, type, start)->(one, two, three, four) you have (id, type, start, stop)->(one, two, three, four) (meaning you have to chose other columns that correspond
to max(stop)), this query usually results in sensible execution plan:
but it depends on how data is distributed among your source tables and what indexes are present. In some cases solutions from link above might still be better.