I have a large (3Gb) binary file of doubles which I access (more or

Question

0

Asked: May 13, 20262026-05-13T08:46:12+00:00 2026-05-13T08:46:12+00:00

I have a large (3Gb) binary file of doubles which I access (more or

0

I have a large (3Gb) binary file of doubles which I access (more or less) randomly during an iterative algorithm I have written for clustering data. Each iteration does about half a million reads from the file and about 100k writes of new values.

I create the FileChannel like this…

f = new File(_filename);
_ioFile = new RandomAccessFile(f, "rw");
_ioFile.setLength(_extent * BLOCK_SIZE);
_ioChannel = _ioFile.getChannel();

I then use a private ByteBuffer the size of a double to read from it

private ByteBuffer _double_bb = ByteBuffer.allocate(8);

and my reading code looks like this

public double GetValue(long lRow, long lCol) 
{
    long idx = TriangularMatrix.CalcIndex(lRow, lCol);
    long position = idx * BLOCK_SIZE;
    double d = 0;
    try 
    {
        _double_bb.position(0);
        _ioChannel.read(_double_bb, position);
        d = _double_bb.getDouble(0);
    } 

    ...snip...

    return d;
}

and I write to it like this…

public void SetValue(long lRow, long lCol, double d) 
{
    long idx = TriangularMatrix.CalcIndex(lRow, lCol);
    long offset = idx * BLOCK_SIZE;
    try 
    {
        _double_bb.putDouble(0, d);
        _double_bb.position(0);
        _ioChannel.write(_double_bb, offset);
    } 

    ...snip...

}

The time taken for an iteration of my code increases roughly linearly with the number of reads. I have added a number of optimisations to the surrounding code to minimise the number of reads, but I am at the core set that I feel are necessary without fundamentally altering how the algorithm works, which I want to avoid at the moment.

So my question is whether there is anything in the read/write code or JVM configuration I can do to speed up the reads? I realise I can change hardware, but before I do that I want to make sure that I have squeezed every last drop of software juice out of the problem.

Thanks in advance

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-13T08:46:12+00:00

As long as your file is stored on a regular harddisk, you will get the biggest possible speedup by organizing your data in a way that gives your accesses locality, i.e. causes as many get/set calls in a row as possible to access the same small area of the file.

This is more important than anything else you can do because accessing random spots on a HD is by far the slowest thing a modern PC does – it takes about 10,000 times longer than anything else.

So if it’s possible to work on only a part of the dataset (small enough to fit comfortably into the in-memory HD cache) at a time and then combine the results, do that.

Alternatively, avoid the issue by storing your file on an SSD or (better) in RAM. Even storing it on a simple thumb drive could be a big improvement.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I have a large (3Gb) binary file of doubles which I access (more or

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply