I have a very large table in oracle that contains 140+ million rows. Currently

Question

0

Asked: June 16, 20262026-06-16T05:54:16+00:00 2026-06-16T05:54:16+00:00

I have a very large table in oracle that contains 140+ million rows. Currently

0

I have a very large table in oracle that contains 140+ million rows. Currently we are doing three full table scans on this table nightly, and using some of the results to populate a tmp table. That tmp table is then turned into a very large report (usually 140K + lines).

The big table is called tasklog and has the following structure has:
tasklog_id (number) – PK
document_id (number)
date_time_in (date)
+ a few more rows that aren’t relevant

There are millions of different document ids each repeated between 1 and several hundred times, date_time_in is the time this entry was put into the database.

All of the full table scans looks like this

DECLARE
n_prevdocid     number;

cursor tasks is
   select * 
   from tasklog
   order by document_id, date_time_in DESC;

BEGIN

for tk in tasks
loop
    if n_prevdocid <> tk.document_id then
         -- *code snipped*

    end if;
    n_prevdocid = tk.document_id;
end loop;

END;
/

So my question: is there a quick (ish) way to get a distinct list of document_ids with the row having the most recent date_time_in. This could dramatically speed up the whole thing. Or can anyone think of a better way of retrieving this data daily?

Things that may be relevant, this table only ever has rows inserted with current date time. It is not range paritioned but I can’t see how that might help me. No rows are ever updated or deleted. There are about 70k – 80k rows inserted daily.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-16T05:54:18+00:00

I don’t think that you’re going to get away from doing at least one full table scan, as the only way that it would be efficient would be is if the ratio of distinct document_id’s to total records was pretty small. The clustering on the document_id is going to be very poor due to the way that the data is generated and inserted.

How about:

create table tmp nologging compress -- or pctfree 0
as
select ...
from   (
  select t.*,
         max(date_time_in) over (partition by document_id) max_date_time_in
  from   tasklog t)
where   date_time_in = max_date_time_in

Possibly, having created this once, you could then optimise further refreshes by merging into this set only the newer records. Something like …

merge into tmp
using (
  select ...
  from   (
    select t.*,
           max(date_time_in) over (partition by document_id) max_date_time_in
    from   tasklog t
    where  date_time_in > (select max(date_time_in) from tmp))
  where   date_time_in = max_date_time_in)
on ... blah blah

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I have a very large table in oracle that contains 140+ million rows. Currently

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply