I am trying to benchmark how fast can Java do a simple task: read

Question

0

Asked: May 25, 20262026-05-25T15:13:23+00:00 2026-05-25T15:13:23+00:00

I am trying to benchmark how fast can Java do a simple task: read

0

I am trying to benchmark how fast can Java do a simple task: read a huge file into memory and then perform some meaningless calculations on the data. All types of optimizations count. Whether it’s rewriting the code differently or using a different JVM, tricking JIT ..

Input file is a 500 million long list of 32 bit integer pairs separated by a comma. Like this:

44439,5023
33140,22257
…

This file takes 5.5GB on my machine. The program can’t use more than 8GB of RAM and can use only a single thread.

package speedracer;

import java.io.FileInputStream;
import java.nio.MappedByteBuffer;
import java.nio.channels.FileChannel;

public class Main
{
    public static void main(String[] args)
    {
        int[] list = new int[1000000000];

        long start1 = System.nanoTime();
        parse(list);
        long end1 = System.nanoTime();

        System.out.println("Parsing took: " + (end1 - start1) / 1000000000.0);

        int rs = 0;
        long start2 = System.nanoTime();

        for (int k = 0; k < list.length; k++) {
            rs = calc(list[k++], list[k++], list[k++], list[k]);
        }

        long end2 = System.nanoTime();

        System.out.println(rs);
        System.out.println("Calculations took: " + (end2 - start2) / 1000000000.0);
    }

    public static int calc(final int a1, final int a2, final int b1, final int b2)
    {
        int c1 = (a1 + a2) ^ a2;
        int c2 = (b1 - b2) << 4;

        for (int z = 0; z < 100; z++) {
            c1 ^= z + c2;
        }

        return c1;
    }

    public static void parse(int[] list)
    {
        FileChannel fc = null;
        int i = 0;

        MappedByteBuffer byteBuffer;

        try {
            fc = new FileInputStream("in.txt").getChannel();

            long size = fc.size();
            long allocated = 0;
            long allocate = 0;

            while (size > allocated) {

               if ((size - allocated) > Integer.MAX_VALUE) {
                   allocate = Integer.MAX_VALUE;
               } else {
                   allocate = size - allocated;
               }

               byteBuffer = fc.map(FileChannel.MapMode.READ_ONLY, allocated, allocate);
               byteBuffer.clear();

               allocated += allocate;

               int number = 0;

               while (byteBuffer.hasRemaining()) {
                   char val = (char) byteBuffer.get();
                   if (val == '\n' || val == ',') {
                        list[i] = number;

                        number = 0;
                        i++;
                   } else {
                       number = number * 10 + (val - '0');
                   }
                }
            }

            fc.close();

        } catch (Exception e) {
            System.err.println("Parsing error: " + e);
        }
    }
}

I’ve tried all I could think of. Trying different readers, tried openjdk6, sunjdk6, sunjdk7. Tried different readers. Had to do some ugly parsing since MappedByteBuffer cannot map more than 2GB of memory at once. I’m running:

   Linux AS292 2.6.38-11-generic #48-Ubuntu SMP 
   Fri Jul 29 19:02:55 UTC 2011 
   x86_64 GNU/Linux. Ubuntu 11.04. 
   CPU: is Intel(R) Core(TM) i5-2410M CPU @ 2.30GHz.

Currently, my results are for parsing: 26.50s, calculations: 11.27s. I’m competing against a similar C++ benchmark which does the IO in roughly the same time but the calculations take only 4.5s. My main objective is to reduce the calculation time in any means possible. Any ideas?

Update: It seems the main speed improvement could come from what is called Auto-Vectorization. I was able to find some hints that the current Sun’s JIT only does “some vectorization” however I can’t really confirm it. It would be great to find some JVM or JIT that would have better auto-vectorization optimization support.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-25T15:13:24+00:00

First of all, -O3 enables:

-finline-functions
-ftree-vectorize

among others…

So it looks like it actually might be vectorizing.

EDIT :
This has been been confirmed. (see comments) The C++ version is indeed being vectorized by the compiler. With vectorization disabled, the C++ version actually runs a bit slower than the Java version

Assuming the JIT does not vectorize the loop, it may be difficult/impossible for the Java version to match the speed of the C++ version.

Now, if I were a smart C/C++ compiler, here’s how I would arrange that loop (on x64):

int c1 = (a1 + a2) ^ a2;
int c2 = (b1 - b2) << 4;

int tmp0 = c1;
int tmp1 = 0;
int tmp2 = 0;
int tmp3 = 0;

int z0 = 0;
int z1 = 1;
int z2 = 2;
int z3 = 3;

do{
    tmp0 ^= z0 + c2;
    tmp1 ^= z1 + c2;
    tmp2 ^= z2 + c2;
    tmp3 ^= z3 + c2;
    z0 += 4;
    z1 += 4;
    z2 += 4;
    z3 += 4;
}while (z0 < 100);

tmp0 ^= tmp1;
tmp2 ^= tmp3;

tmp0 ^= tmp2;

return tmp0;

Note that this loop is completely vectorizable.

Even better, I would completely unroll this loop. These are things that a C/C++ compiler will do. But now the question, is will the JIT do it?

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I am trying to benchmark how fast can Java do a simple task: read

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply