Suppose I have a two dimensional array in C++ under CUDA, stored in the shared memory,
like so:
__shared__ float arr[4][4]; // C++ has a default row-major ordering
By default C++ will order the elements in arr in a row-major format.
That is it will allocate a continuous block of memory and store the elements like this (0,0), (0,1), (0,2), (0,3), (1,0), (1,1), … and so on…
Is there a way to tell the C++/CUDA compiler to arrange this in a column-major order?
Why don’t you just swap indexes you are using?
Instead of using
arr[x][y]usearr[y][x].Interesting is why you would like to do this. Maybe using cache memory could be helpful but I can’t tell for sure without details.
Hope it help.