I am using DB2 in this case, but I’m figuring this has a generic SQL answer. I have simplified the data as much as I can. I am counting actions on things called “Claims”. Each claim has a unique claim number. Each action is timestamped in the format “hhmm”. Actually, I’m not counting actions, I’m counting action sessions–Most of the time, a person performs one action on one claim, and that’s one action session. But sometimes a person performs multiple actions on one claim, separated by a few seconds or a few minutes: that also would be one action session. But if somebody performed an action on a claim at 10am, and then performed an action on that same claim at 1pm, those would be two action sessions. For my purposes, the time window for what makes something one action session vs. two action sessions is 3 hours, but that’s arbitrary, of course. And there is no worry of the window spanning across midnight. Also, I have read-only access to this data, and I have to do this in one statement. Thanks.
So Here’s some data (Table: ACTIONS):
CLAIM_NO ACTTIME
AA 1424
BB 1134
CC 1221
DD 1425
DD 1512
EE 1619
FF 0928
FF 1518
GG 1348
HH 1332
II 1350
I would like to turn that into
CLAIM_NO ACTTIME
AA 1424
BB 1134
CC 1221
DD 1425
EE 1619
FF 0928
FF 1518
GG 1348
HH 1332
II 1350
(Note that the second DD record is gone, but the second FF record is still there).
I have accomplished this by joining the table to itself, on CLAIM_NO being equal and ACTTIME being between 3 hours earlier and 1 minute earlier. This allows me to get the rows that don’t belong, and then I use EXCEPT to eliminate them.
with excepto as (
select a.claim_no, b.acttime
from actions a
join actions b
on a.claim_no=b.claim_no
and a.acttime between (b.acttime-300) and (b.acttime-1)
)
select * from actions except select * from excepto
But I’d like to do this with one join, so there is no “except” necessary. This is in hopes that performance will be better: my real data has more columns being used by the except and more rows, of course. And that except statement seems to be slowing the query down a whole lot. I’m using a whole lot of temporary tables via the “with” statement, and they seem to be much slower than the sum of their parts.
I feel a little silly for forgetting about this…
You don’t need the
except– there’s a join available calledexceptionthat does exactly what you want (and I’ve used it heavily):Gains you a result set of:
(unfortunately, this isn’t going to cut it for situations where you have somebody making changes at least once every 3 hours – it’s only going to show up the first one. I believe you need something like a 6-way self-join minimum to detect the proper entries, and it’s convoluted somewhat, too; you may have better luck dealing with this application side)