Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 9024021
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 16, 20262026-06-16T05:54:16+00:00 2026-06-16T05:54:16+00:00

I have a very large table in oracle that contains 140+ million rows. Currently

  • 0

I have a very large table in oracle that contains 140+ million rows. Currently we are doing three full table scans on this table nightly, and using some of the results to populate a tmp table. That tmp table is then turned into a very large report (usually 140K + lines).

The big table is called tasklog and has the following structure has:
tasklog_id (number) – PK
document_id (number)
date_time_in (date)
+ a few more rows that aren’t relevant

There are millions of different document ids each repeated between 1 and several hundred times, date_time_in is the time this entry was put into the database.

All of the full table scans looks like this

DECLARE
n_prevdocid     number;

cursor tasks is
   select * 
   from tasklog
   order by document_id, date_time_in DESC;

BEGIN

for tk in tasks
loop
    if n_prevdocid <> tk.document_id then
         -- *code snipped*

    end if;
    n_prevdocid = tk.document_id;
end loop;

END;
/

So my question: is there a quick (ish) way to get a distinct list of document_ids with the row having the most recent date_time_in. This could dramatically speed up the whole thing. Or can anyone think of a better way of retrieving this data daily?

Things that may be relevant, this table only ever has rows inserted with current date time. It is not range paritioned but I can’t see how that might help me. No rows are ever updated or deleted. There are about 70k – 80k rows inserted daily.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-16T05:54:18+00:00Added an answer on June 16, 2026 at 5:54 am

    I don’t think that you’re going to get away from doing at least one full table scan, as the only way that it would be efficient would be is if the ratio of distinct document_id’s to total records was pretty small. The clustering on the document_id is going to be very poor due to the way that the data is generated and inserted.

    How about:

    create table tmp nologging compress -- or pctfree 0
    as
    select ...
    from   (
      select t.*,
             max(date_time_in) over (partition by document_id) max_date_time_in
      from   tasklog t)
    where   date_time_in = max_date_time_in
    

    Possibly, having created this once, you could then optimise further refreshes by merging into this set only the newer records. Something like …

    merge into tmp
    using (
      select ...
      from   (
        select t.*,
               max(date_time_in) over (partition by document_id) max_date_time_in
        from   tasklog t
        where  date_time_in > (select max(date_time_in) from tmp))
      where   date_time_in = max_date_time_in)
    on ... blah blah
    
    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I have a very large table of over 9 million rows and in my
I have a very large table (15 million rows, this is an audit table).
I have a very large table [X], which has 170 million rows, and we
We have a very large historical table that contains a column with at most
I have two very large enterprise tables in an Oracle 10g database. One table
I'm using Oracle, and I have a very large table. I need to check
I have a very large table (374870 rows) and when I run the following
I have a very large table being filled with about 100s of million records
I have a very large HTML table containing 1500 rows (markup produced by PHP).
I have a very large table (few million records). The columns in the table

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.