Im rather new to assembly and although the arm information center is often helpful sometimes the instructions can be a little confusing to a newbie. Basically what I need to do is sum 4 float values in a quadword register and store the result in a single precision register. I think the instruction VPADD can do what I need but I’m not quite sure.
Im rather new to assembly and although the arm information center is often helpful
Share
It seems that you want to get the sum of a certain length of array, and not only four float values.
In that case, your code will work, but is far from optimized :
many many pipeline interlocks
unnecessary 32bit addition per iteration
Assuming the length of the array is a multiple of 8 and at least 16 :
I hope the rest of the code above is self explanatory.
You will notice that this version is many times faster than your initial one.