I am trying to modify a piece of code that uses SSE (128bit) calls to use the 256bit FMA feature on the Bulldozer Opteron. I cant seem to find the intrinsics for these calls.
Some questions on this forum have used these intrinsics (ex: How to find the horizontal maximum in a 256-bit AVX vector )
I found this:
http://msdn.microsoft.com/en-us/library/gg445140.aspx
and http://software.intel.com/sites/products/documentation/studio/composer/en-us/2011/compiler_c/index.htm#intref_cls/common/intref_avx_fmadd_ps.htm
But I cant seem to find anything on AMD developer docs.
You find the intrinsics in the file
fma4intrin.h. Here are the 256 bit instructions from this file, some function attributes stripped. The__buitin*functions emit the FMA instruction which is part of their name. So if you want to find a intrinsic function name, you need to lookup the correct__builtin_instructionnameafter the return and use the surrounding function wrapper.