I’m trying to fix some errors on a big database of stock exchange data. One column (quantity) has the traded volume on each tick, and other column stores the cumulative volume (i.e., the sum of the previous ticks of the day). This second column is wrong in some cases (not a lot, so we can safely assume that no to adjacent ticks are wrong). So theoretically the fix is easy: just search for a tick where the cumulative volume decreases (this suffices), and then pick the cumulative volume from the last tick and sum the quantity
of the current tick. The thing is that i’ve been trying to get to work a query that does this in oracle, but i’m struggling due to my lack of expertise in sql. This is what i’ve gotten so far:
update
(
select m.cumulative_volume, q.cum_volume_ant, q.quantity from
market_data_intraday_trades m
join
(
select * from
(select
product_key,
sequence_number,
lead(product_key) over (order by product_key, sequence_number) as product_key_ant,
to_char(trade_date_time, 'yyyymmdd') as fecha,
to_char(lag(trade_date_time) over (order by product_key, sequence_number), 'yyyymmdd') as fecha_ant,
cumulative_volume,
lead(cumulative_volume) over (order by product_key, sequence_number) as cum_volume_ant,
cumulative_volume - lead(cumulative_volume) over (order by product_key, sequence_number) as dif
from market_data_intraday_trades)
where product_key = product_key_ant
and fecha = fecha_ant
and dif < 0
and rownum < 10
) q
on m.sequence_number = q.sequence_number
)
set m.cumulative_volume = q.cum_volume_ant + q.quantity
The current problem being that i can’t seem to be able to use quantities from the inner query in the outside calculations.
Perhaps all of this would be clearer and/or easier with temporal tables or pl/sql or cursors, but due to corporate policies, i have no priviledges to do that, just selects and updates.
I would be very grateful if you could point me in some direction to solve this.
Thanks in advance!
PS. Fecha is date in spanish, just in case 🙂
Here is some test data. As you can see the CUMULATIVE_VOLUME for the fourth row is wrong.
The easiest solution is just to update all the rows with the calculated running total:
However, if you have a lot of data and you really don’t want all those unnecessary updates then use the same query again to restricted the hits:
I tried Nicolas’s suggestion of using MERGE. If you are using 10g or higher, then this would work. You need a recent version of Oracle because 9i didn’t support MERGE with an UPDATE but no INSERT (and 8i didn’t support MERGE at all).
This solution is tidier then the other solution.