Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 6958791
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 27, 20262026-05-27T15:11:11+00:00 2026-05-27T15:11:11+00:00

I am working on a data warehousing project, and therefore, I have been implementing

  • 0

I am working on a data warehousing project, and therefore, I have been implementing some ETL Functions in Packages. I first encountered a problem on my developing laptop and thought it had something to do with my oracle installation, but now it has “spread” over to the production servers.
Two functions “sometimes” become incredible slow. We have implemented a logging system, giving us output on a logging table each x rows. When the function usually needs like 10 seconds per chunk, “sometimes” the functions needs up to 3 minutes. After rebuilding some indexes and restarting the function, it is as quick again as it used to be.
Unfortunately, I can’t tell which index it is exactly, since restarting the function and building up the cursor it uses for its work takes some time and we do not have the time to check each index on its own, so I just rebuild all indexes that are potentially used by the function and restart it.

The functions that have the problem use a cursor to select data from a table with about 50 million to 200 million entries, joined by a small table with about 50-500 entries. The join condition is a string comparison. We then use the primary key from the small table we get from the join to update a foreign key on the main table. The update process is done by a forall loop, this has proven to save loads of time.

Here is a simplified version of the table structure of both tables:

CREATE TABLE "maintable" 
(   "pkmid" NUMBER(11,0) NOT NULL ENABLE, 
"fkid" NUMBER(11,0), 
"fkstring" NVARCHAR2(4) NOT NULL ENABLE, 
 CONSTRAINT "PK_MAINTABLE" PRIMARY KEY ("pkmid");

CREATE TABLE "smalltable" 
(   "pksid" NUMBER(11,0) NOT NULL ENABLE, 
"pkstring" NVARCHAR2(4) NOT NULL ENABLE, 
 CONSTRAINT "PK_SMALLTABLE" PRIMARY KEY ("pksid");

Both tables have indexes on their string columns. Adding the primary keys, I therefore rebuild 4 indexes each time the problem happens.

We get our data in a way, that we only have the fkstring in the maintable available and the fkid is set to null. In a first step, we populate the small table. This only takes minutes and is done the following way:

INSERT INTO smalltable (pksid, pkstring)
    SELECT SEQ_SMALLTABLE.NEXTVAL, fkstring
        FROM 
        (
            SELECT DISTINCT mt.fkstring
                FROM maintable mt                        
            MINUS
            SELECT st.pkstring
                FROM smalltable st
        );
        commit;

This function never causes any trouble.

The following function does (it is a simplified version of the function – I have removed logging and exception handling and renamed some variables):

function f_set_fkid return varchar2 is

    cursor lCursor_MAINTABLE is
        SELECT MT.PKmID, st.pksid
            FROM maintable mt
            JOIN smalltable st ON (mt.fkstring = st.pkstring)
            WHERE mt.fkid IS NULL;
    lIndex number := 0;
    lExitLoop boolean := false;

    type lCursorType is table of lCursor_MAINTABLE%rowtype index by pls_integer;
    lCurrentRow lCursor_MAINTABLE%rowtype;
    lTempDataArray lCursorType;
    lCommitEvery constant number := 1000;

    begin

        open lCursor_MAINTABLE;
            loop

                -- get next row, set exit condition
                fetch lCursor_MAINTABLE into lCurrentRow;
                if (lCursor_MAINTABLE%notfound) then
                    lExitLoop := true;
                end if;

                -- in case of cache being full, flush cache
                if ((lTempDataArray.count > 0) AND (lIndex >= lCommitEvery OR lExitLoop)) then
                    forall lIndex2 in lTempDataArray.FIRST..lTempDataArray.LAST
                        UPDATE maintable mt 
                            set fkid = lTempDataArray(lIndex2).pksid
                            WHERE mt.pkmid = lTempDataArray(lIndex2).pkmid;     
                    commit;
                    lTempDataArray.delete;
                    lIndex := 0;                           
                end if;                                                 

                -- data handling, fill cache
                if (lExitLoop = false) then
                    lIndex := lIndex + 1;                                 
                    lTempDataArray(lIndex). := lCurrentRow;
                end if;

                exit when lExitLoop;

            end loop;

        close lCursor_MAINTABLE; 

        return null;

    end;  

I would be very thankful for any help.

P.S. I do know that bulk collect into would speed up the function and probably also ease up the code a bit too, but at the moment we are content with the speed of the function it usually has. Changing the function to use bulk collect is on our plan for next year, but at the moment it is not an option (and I doubt it would solve this index problem).

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-27T15:11:12+00:00Added an answer on May 27, 2026 at 3:11 pm

    If you have a table where the number of rows fluctuates wildly (as in when doing ETL loads) I would use the statistics of the fully loaded table throughout the load process.

    So, generate statistics when your table is fully loaded and then use those statistics for subsequent loads.

    If you use statistics from when the table is half-loaded the optimizer may be tricked into not using indexes or not using the fastest index. This is especially true if data is loaded in order so that low value, high value and density are skewed.

    In your case, the statistics for columns fkstring and fkid are extra important since those two columns are heavily involved in the procedure that has performance issues.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

Having spent some time working on data warehousing, I have created both ETL (extract
I've been working on some data transformation tasks in SSIS. Visual Studio has gotten
I am working on a data warehousing project where several systems are loading data
I've been working on data import functions for a library recently and I found
I have two (alternative) functions working on data supplied by a sensor. The functions
I have been working with warehousing for a while now. I am intrigued by
I'm working on a data warehouse project and would like to know how to
I'm getting my first exposure to data warehousing, and I’m wondering is it necessary
I have data in a table, but am working on data loading. I want
I'm working with data that is natively supplied as rational numbers. I have a

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.