Basically I am solving the diffusion equation in 3D using FFT and one of the ways to parallelise this is to decompose the 3D FFT in 2D FFTs.
As described in this paper: https://cmb.ornl.gov/members/z8g/csproject-report.pdf
The way to decompose a 3d fft would be by doing:
2d fft in xy direction
global transpose
1d fft in z direction
Basically, my problem is that I am not sure how to do this global transpose (as I assume it’s transposing a 3d array I suppose). Anyone has came accross this? Thanks a lot.
Think of a 3d cube with
nx*ny*nzelements. The 3d FFT of these elements is mathematically 3 stages of 1-d FFTs, one along each axis:More generally, an N-dimensional FFT (N>1) is composed of many (N-1)-dimensional FFTs along that axis.
If the signal is real and you have an FFT that can return the half spectrum, then stage 1 would be about half as expensive (real FFT is cheaper), the remaining stages need to be complex, but they only need to have about half as many transforms. So the cost is roughly half.
If your 1d FFT can read input elements that are strided and pack the output into a contiguous buffer, then you end up doing a transposition at each stage.
This is how kissfft performs multi-dimensional FFTs.
P.S. When I need to get a mental pictures of higher dimensions, I think of:
sheets of paper with matrices of numbers (2d), in folders of numbered papers (3d), in numbered filing cabinets (4d), in numbered rooms (5d), in numbered buildings (6d), and so on … So I can visualize the “filing cabinet” dimension