I have a sql question that is closely related to this question – SQL – Need to find duplicate records but EXCLUDE reversed transactions
I need to remove all reversal “pairs” for a recordset using (if possible) non procedural SQL. The specific rdbms is Oracle 11g, but I would like the SQL to be as generic as possible so the same strategy can be used in SQL Server 2008. The example recordset looks like this:
ROW | DATE | QTY | FUEL_TYPE | REVERSAL |
1 | 01-MAY-12 | 23.3 | DSL | N |
2 | 01-MAY-12 | -23.3 | DSL | Y |
3 | 01-MAY-12 | 23.3 | DSL | N |
4 | 01-MAY-12 | 23.3 | DSL | N |
5 | 01-MAY-12 | 23.3 | DSL | N |
6 | 01-MAY-12 | 18.6 | DSL | N |
7 | 01-MAY-12 | -18.6 | DSL | Y |
8 | 01-MAY-12 | 14.9 | GAS | N |
The desired outcome of the query would reduce this recordset to:
ROW | DATE | QTY | FUEL_TYPE | REVERSAL |
3 | 01-MAY-12 | 23.3 | DSL | N |
4 | 01-MAY-12 | 23.3 | DSL | N |
5 | 01-MAY-12 | 23.3 | DSL | N |
8 | 01-MAY-12 | 14.9 | GAS | N |
Notice that duplicates are possible, but the reversal “pairs” always need to be removed.
edit
The rows and row numbers are irrelevant and are just used to illustrate. It doesn’t really matter which records are removed, just that there is always a “pair” – a positive amount and negative amount. So, for example, row 2 could be paired with 1,3,4 or 5 and removed.
Also, the logic that populates the table and the table structure itself is controlled by vendor software, and DOES NOT include the original id of the record that is being reversed in a reversal record. I don’t really have any control over this.
/edit
Incidentally, I would love it if the MINUS keyword were changed such that it functioned similar to UNION and UNION ALL – in that MINUS would remove only single rowsets that match from a second recordset, but MINUS ALL removed every row that matches values from a second recordset. If that were the case, this problem would be trivial (at least for the way that my brain thinks).
It turns out I was looking at the problem in a pretty terrible way. Instead of finding the exact reversal pair, I just did a SUM with a GROUP BY, so only the values that I cared about keeping remained.
The end result is that transactions will end up distinct – especially if, like in my case, the real transaction table is actually a datetime value instead of date.
The only time this won’t produce the values that you really want are if you have a need to maintain the id’s of the transactions, or if you have a situation in which multiple transactions occur at the exact same time.