I spent some time optimizing my algorithm and my quasi-serial (no explicit parralelization) code spends 95% of the time on a line that performs the fftn and dense single(float) matrix multiplication
for k=1:10
q = q + x{k}.* fftn( mArray{k}.* ifftn( mOther{k} .* z ) );
I tried adding some wisdoms for the FFT although the performance increase was negligible.
I am at a loss for ways to speed up this code, do you think compiling FFTW could result in a performance increase? I am using Matlab 2012b for a 3rd generation i7.
Edit
I seem to have made a typo, x depends on k, it would have been too easy otherwise. I was hoping somebody could speak to optimizing the actual fft.
q = q + x.* fftn( mArray{k}.* ifftn( mOther{k} .* z ) );
q = q + x{k}.* fftn( mArray{k}.* ifftn( mOther{k} .* z ) );
You should add the q’s together before fftn transforming and multiplying by x. e.g.
IMO this should be equivalent.