Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • Home
  • SEARCH
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 8514869
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 11, 20262026-06-11T04:59:17+00:00 2026-06-11T04:59:17+00:00

I know how the FFT implementation works ( Cooley-Tuckey algorithm ) and I know

  • 0

I know how the FFT implementation works (Cooley-Tuckey algorithm) and I know that there’s a CUFFT CUDA library to compute the 1D or 2D FFT quickly, but I’d like to know how CUDA parallelism is exploited in the process.

Is it related to the butterfly computation? (something like each thread loads part of the data into shared memory and then each thread computes an even term or an odd term?)

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-11T04:59:18+00:00Added an answer on June 11, 2026 at 4:59 am

    I do not think they use Cooley-Tuckey algorithm because its index permutation phase makes it not very convenient for shared-memory architectures. Additionally this algorithm works with power-of-two memory strides which is also not good for memory coalescing. Most likely they use some formulation of Stockham self-sorting FFT: for example Bailey’s algorithm.

    What concerns the implementation, you are right, usually one splits a large FFT into several smaller ones which fit perfectly within one thread block. In my work, I used 512- or 1024-point FFTs (completely unrolled of course) per thread block with 128 threads. Typically, you do not work with a classical radix-2 algorithm on the GPU due to large amount of data transfers required. Instead, one chooses radix-8 or even radix-16 algorithm so that each thread performs one large “butterfly” at a time. For example implementations, you can also visit Vasily Volkov page, or check this “classic” paper.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

does anyone know an implementation of the inverse FFT in HLSL/GLSL/cg ... ? It
I know that Phonegap has an event for back button, but it's only available
I know that this sort of question has been asked here before, but still
I know there are lots of tools on the net that can make our
I know there is something wrong with the following reasoning but I'm not sure
I am doing something in CUDA (FFT), but I have no idea why it
I want to implement Fast Fourier Transform algorithm with Hadoop. I know recursive-fft algorithm
Know this might be rather basic, but I been trying to figure out how
I know you can not set a key value dynamically, but what about the
I know there have been many questions on grid and pack in the past

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.