I have to process a large table (2.5B records) row by row in order to keep track of two variables. As one can imagine, this is quite slow. I am looking for ideas on how to tune this procedure. Thank you.
declare
cursor c_data is select /* +index(data data_pk) */ * from data order by data_id;
r_data c_data%ROWTYPE;
lst_b_prc number(15,8);
lst_a_prc number(15,8);
begin
open c_data;
loop
fetch c_data into r_data;
exit when c_data%NOTFOUND;
if r_data.BATS = 'B' then
lst_b_prc := r_data.PRC;
end if;
if r_data.BATS = 'A' then
lst_a_prc := r_data.PRC;
end if;
if r_data.BATS = 'T' then
insert into trans .... lst_a_prc , lst_b_prc
end if;
end loop;
close c_data;
end;
The issue really comes down to finding efficient sql to track the latest PRC value when BATS=’A’ and BATS=’B’ for each BATS=’T’ record.
If I understand your problem correctly, with a table of data like this:
You you want to insert one row for each T, using the last PRC value for A and B. Which would look something like this:
This query should work:
With so much data, you’ll probably want to use direct path writes and parallelism.
The performance will largely depend on whether the sorting for the analytic functions can be done in memory or on disk. Optimizing memory can be very difficult, you’ll probably need to work with a DBA to allow your process to use as much memory as possible without causing problems for other processes.