I’m currently looking for a way to optimize a process that takes a long

Question

0

Editorial Team

Asked: May 30, 20262026-05-30T12:03:30+00:00 2026-05-30T12:03:30+00:00

I’m currently looking for a way to optimize a process that takes a long

0

I’m currently looking for a way to optimize a process that
takes a long time to run.

There are about 270 text files to be filtered.
Each file has about 70k~150k lines.
The reference table has usually about 16m records under Oracle 10g.
The process is run every hour.
There’s a possibility that 9 instances of that process may be run almost
simultaneaously.

What I currently do is spool the reference table into a file, copy that into
a hash, do the same with the text file, then do a hash key match up.
Any record on the text file found on the reference list will be discarded.

This gets repeated for all 270 files, however the spooling part is only done
once at the start.

However this approach consumes about 300mb~500mb of RAM, and with the
possibility of having multiple instances of that process running almost at
the same time, its nightmare to our server.

Any ideas how to do this better?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-30T12:03:32+00:00

I’d suggest you to only load DB data to the memory, and to process files like (sorry, that’s a pseudocode, but you should get an idea and implement it in perl):

HashSet dbData = GetDataFromDB();
foreach(filename in filenames) {
    FileHandle handle = OpenRead(filename);
    FileHandle tmphandle = OpenWrite(filename + ".tmp");
    while(string line = handle.ReadLine()) {
        if(!dbData.Contains(line)) {
            tmphandle.Write(line);
        }
    }
    tmphandle.Flush();
    tmphandle.Close();
    handle.Close();
    Delete(filename);
    Rename(tmpfilename, filename);
}

This is going to take only about as much RAM as your reference table takes.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I’m currently looking for a way to optimize a process that takes a long

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply