My code relies heavily on computing distances between two points in 3D space.
To avoid the expensive square root I use the squared distance throughout.
But still it takes up a major fraction of the computing time and I would like to replace my simple function with something even faster.
I now have:
double distance_squared(double *a, double *b)
{
double dx = a[0] - b[0];
double dy = a[1] - b[1];
double dz = a[2] - b[2];
return dx*dx + dy*dy + dz*dz;
}
I also tried using a macro to avoid the function call but it doesn’t help much.
#define DISTANCE_SQUARED(a, b) ((a)[0]-(b)[0])*((a)[0]-(b)[0]) + ((a)[1]-(b)[1])*((a)[1]-(b)[1]) + ((a)[2]-(b)[2])*((a)[2]-(b)[2])
I thought about using SIMD instructions but could not find a good example or complete list of instructions (ideally some multiply+add on two vectors).
GPU’s are not an option since only one set of points is known at each function call.
What would be the fastest way to compute the distance squared?
A good compiler will optimize that about as well as you will ever manage. A good compiler will use SIMD instructions if it deems that they are going to be beneficial. Make sure that you turn on all such possible optimizations for your compiler. Unfortunately, vectors of dimension 3 don’t tend to sit well with SIMD units.
I suspect that you will simply have to accept that the code produced by the compiler is probably pretty close to optimal and that no significant gains can be made.