It is hard to say without more info, such as,…

Question

0

Asked: May 12, 20262026-05-12T10:35:32+00:00 2026-05-12T10:35:32+00:00

I have a following code in a most inner loop of my program struct

0

I have a following code in a most inner loop of my program

struct V {
  float val [200]; // 0 <= val[i] <= 1
};

V a[600];
V b[250];
V c[250];
V d[350];
V e[350];

// ... init values in a,b,c,d,e ...

int findmax(int ai, int bi, int ci, int di, int ei) {
  float best_val = 0.0;
  int best_ii = -1;

  for (int ii = 0; ii < 200; ii++) {
    float act_val =
      a[ai].val[ii] +
      b[bi].val[ii] +
      c[ci].val[ii] +
      d[ci].val[ii] +
      e[ci].val[ii];

    if (act_val > best_val) {
      best_val = act_val;
      best_ii = ii;
    }
  }

  return best_ii;
}

I don’t care whether it will be some clever algorithm (but this would be most interesting) or some C++ tricks or intrinsics or assembler. But I need to make findmax function more efficient.

Big thanks in advance.

Edit:
It seems that branch is the slowest operation (misprediction?).

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-12T10:35:33+00:00

Well, I see no obvious room for algorithmic optimizations. Theoreticaly one could only calculate the sum of the five vectors until it is obvious that the maximum cannot be reached, but this would add way to much overhead for only summing five numbers. You could try using multiple threads and assign ranges to the threads, but you have to think about the thread creation overhead when you have only 200 very short work items.

So I tend to say that using Assembler and MMX or SSE instructions on x86 or maybe a (machine specific) C++ a library providing access to this instructions is your best bet.

How to approach applying for a job at a company ...

How to handle personal stress caused by utterly incompetent and ...

What is a programmer’s life like?

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions