I need to optimize my matrix multiplication by using SIMD/Intel SSE. The example code given looks like:
*x = (float*)memalign(16, size * sizeof(float));
However, I am using C++ and [found that][1] I instead of malloc (before doing SIMD), I should use new. Now, I’m further optimizing via SIMD/SSE, so I need aligned memory, so question is: do I need memalign/_aligned_malloc or is my array declared like
static float m1[SIZE][SIZE];
already aligned? (SIZE is an int)
Typically, they would not be 16-byte aligned, although there is nothing in the C++ specification that would prevent your compiler from aligning such an array on a 16-byte boundary. Depending upon what compiler you’re using, there is usually a compiler-specific way to request that the array be aligned on a 16-byte boundary. For example, for
gcc, you would use:Alternatively, you could use
posix_memalign(),memalign(), or other aligned-allocation APIs available on your platform to get a block of memory with the desired alignment. As a worst case, you could even allocate memory using standardmalloc()oroperator newand then handle the alignment adjustment yourself.