I want to use a GPU-accelerated algorithm, to perform a fast and memory saving

Question

0

Asked: May 27, 20262026-05-27T20:32:57+00:00 2026-05-27T20:32:57+00:00

I want to use a GPU-accelerated algorithm, to perform a fast and memory saving

0

I want to use a GPU-accelerated algorithm, to perform a fast and memory saving dft. But, when I perform the gpu::dft, the destination matrix is scaled as it is explained in the documentation. How I can avoid this problem with the scaling of the width to dft_size.width / 2 + 1? Also, why is it scaled like this? My Code for the DFT is this:

cv::gpu::GpuMat d_in, d_out;
d_in = in;
d_out.create(d_in.size(), CV_32FC2 );
cv::gpu::dft( d_in, d_out, d_in.Size );

where in is a CV_32FC1 matrix, which is 512×512.

The best solution would be a destination matrix which has the size d_in.size and the type CV_32FC2.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-27T20:32:57+00:00

This is due to complex conjugate symmetry that is present in the output of an FFT. Intel IPP has a good description of this packing (the same packing is used by OpenCV). The OpenCV dft function also describes this packing.

So, from the gpu::dft documentation we have:

If the source matrix is complex and the output is not specified as real, the destination matrix is complex and has the dft_size size and CV_32FC2 type.

So, make sure you pass a complex matrix to the gpu::dft function if you don’t want it to be packed. You will need to set the second channel to all zeros:

Mat realData;

// ... get your real data...

Mat cplxData = Mat::zeros(realData.size(), realData.type());

vector<Mat> channels;
channels.push_back(realData);
channels.push_back(cplxData);

Mat fftInput;
merge(channels, fftInput);

GpuMat fftGpu(fftInput.size(), fftInput.type());
fftGpu.upload(fftInput);

// do the gpu::dft here...

There is a caveat though…you get about a 30-40% performance boost when using CCS packed data, so you will lose some performance by using the full-complex output.

Hope that helps!

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I want to use a GPU-accelerated algorithm, to perform a fast and memory saving

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply