In Java, using the following function for a huge matrix X to print its

Question

0

Asked: June 8, 20262026-06-08T21:08:23+00:00 2026-06-08T21:08:23+00:00

In Java, using the following function for a huge matrix X to print its

0

In Java, using the following function for a huge matrix X to print its column-distinct elements:

// create the list of distinct values
List<Integer> values = new ArrayList<Integer>();

// X is n * m int[][] matrix
for (int j = 0, x; j < m; j++) {
    values.clear();
    for (int i = 0; i < n; i++) {
        x = X[i][j];
        if (values.contains(x)) continue;
        System.out.println(x);
        values.add(x);
    }
}

First I iterate by columns (index j) and inside by rows (index i).

This function will be called millions of times for different matrices, so the code should be optimized to meet the performance requirements. I’m wondering about the values array. Would it be faster to use values = new ArrayList<Integer>(); or values = null instead of values.clear() ?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-08T21:08:24+00:00

What would be much more efficient would be to use a Set instead of a list, for example the HashSet implementation. The contains method will run in O(1) instead of O(n) with a list. And you could save one call by only calling the add method.

As for your specific question, I would just create a new Set at each loop – object creation is not that expensive, probably less than clearing the set (as confirmed by the benchmark at the bottom – see the most efficient version in EDIT 2):

for (int j = 0, x; j < m; j++) {
    Set<Integer> values = new HashSet<Integer>();
    for (int i = 0; i < n; i++) {
        x = X[i][j];
        if (!values.add(x)) continue; //value.add returns true if the element was NOT in the set before
        System.out.println(x);
    }
}

However, the only way to know which is quicker (new object vs. clear) is to profile that portion of your code and check the performance of both versions.

EDIT

I ran a quick benchmark and the clear version seems a little faster than creating a set at each loop (by about 20%). You should still check on your dataset / use case which one is better. Faster code with my dataset:

Set<Integer> values = new HashSet<Integer>();
for (int j = 0, x; j < m; j++) {
    for (int i = 0; i < n; i++) {
        x = X[i][j];
        if (!values.add(x)) continue; //value.add returns true if the element was NOT in the set before
        System.out.println(x);
    }
    values.clear();
}

EDIT 2

An actually even faster version of the code is obtained by creating a new set of the right size at each loop:

for (int j = 0, x; j < m; j++) {
    Set<Integer> values = new HashSet<Integer>(n, 1); //right size from the beginning
    for (int i = 0; i < n; i++) {
        x = X[i][j];
        if (!values.add(x)) continue; //value.add returns true if the element was NOT in the set before
        System.out.println(x);
    }
}

Summary of result

After JVM warm up + JIT:

Set<Integer> values = new HashSet<Integer>(n, 1); =====> 280 ms
values.clear();                                   =====> 380 ms
Set<Integer> values = new HashSet<Integer>();     =====> 450 ms

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

In Java, using the following function for a huge matrix X to print its

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply