Something like this:
_declspec(align(16)) float dens[4];
//Here the code comes. F32vec4 S_START, Pos, _Vector
*((__m128*)dens) = (S_START - Pos) *_Vector;
float steps = max(max(dens[3], dens[2]), max(dens[1], dens[0]));
How do I do this directly using SSE?
There’s no easy way to do this. SSE isn’t particularly meant for horizontal operations. So you have to shuffle…
Here’s one approach:
I note that the final store isn’t really supposed to be a store. It’s just a hack to get the value into the
floatdatatype.In reality no instruction is needed because
floattypes will be stored in the same SSE registers. (It’s just that the top 3 values are ignored.)