I have the following code that I use to compute the distance between two

Question

0

Asked: May 29, 20262026-05-29T22:36:11+00:00 2026-05-29T22:36:11+00:00

I have the following code that I use to compute the distance between two

0

I have the following code that I use to compute the distance between two vectors:

double dist(vector<double> & vecA, vector<double> & vecB){
    double curDist = 0.0;
    for (size_t i = 0; i < vecA.size(); i++){
        double dif = vecA[i] - vecB[i];
        curDist += dif * dif;
    }

    return curDist;
}

This function is a major bottleneck in my application since it relies on a lot of distance calculations, consuming more than 60% of CPU time on a typical input. Additionally, the following line:

double dif = vecA[i] - vecB[i];

is responsible for more than 77% of CPU time in this function. My question is: is it possible to somehow optimize this function?

Notes:

To profile my application I have used Intel Amplifier XE;
Reducing the number of distance computations is not a feasible solution for
me;

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-29T22:36:12+00:00

There are two possible issues I can think of right now:

This computation is memory bound.
There is an iteration-to-iteration dependency on curDist.

This computation is memory bound.

Your dataset is larger than your CPU cache. So in this case, no amount of optimization is going to help unless you can restructure your algorithm.

There is an iteration-to-iteration dependency on curDist.

You have a dependency on curDist. This will block vectorization by the compiler. (Also, don’t always trust the profiler numbers to the line. They can be inaccurate especially after compiler optimizations.)

Normally, the compiler vectorizer can split up the curDist into multiple partial sums to and unroll/vectorize the loop. But it can’t do that under strict-floating-point behavior. You can try relaxing your floating-point mode if you haven’t already. Or you can split the sum and unroll it yourself.

For example, this kind of optimization is something the compiler can do with integers, but not necessarily with floating-point:

double curDist0 = 0.0;
double curDist1 = 0.0;
double curDist2 = 0.0;
double curDist3 = 0.0;
for (size_t i = 0; i < vecA.size() - 3; i += 4){
    double dif0 = vecA[i + 0] - vecB[i + 0];
    double dif1 = vecA[i + 1] - vecB[i + 1];
    double dif2 = vecA[i + 2] - vecB[i + 2];
    double dif3 = vecA[i + 3] - vecB[i + 3];
    curDist0 += dif0 * dif0;
    curDist1 += dif1 * dif1;
    curDist2 += dif2 * dif2;
    curDist3 += dif3 * dif3;
}

//  Do some sort of cleanup in case (vecA.size() % 4 != 0)

double curDist = curDist0 + curDist1 + curDist2 + curDist3;

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I have the following code that I use to compute the distance between two

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply