I was doing the integration task with FPU before, now I’m struggling with SSE.
My main problem is when I was using FPU stack, there was the fsin function, which could be used on the number, which is at the top of the stack (st0).
Now I want to calculate the sine of my all four numbers in XMM0, or calculate it somewhere else and move into XMM0. I’m using the AT&T syntax.
I think the second idea is actually possible, but I don’t know how 🙂
Does anybody know how to do it?
Three options:
sinon SSE vectors.sinfunction using SSE.Store the vector to memory, use
fsinto compute the sine of each element, and load the results. Assuming that your stack is 16-byte aligned and has 16-bytes of space, something like this:(1) is almost certainly your best bet performance-wise, and is also the easiest. If you have significant experience writing vector code and know a priori that the arguments fall into some range, you may be able to get better performance with (2). Using
fsinwill work, but it’s ugly and slow and not particularly accurate, if that matters.