Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 7512593
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 29, 20262026-05-29T23:54:39+00:00 2026-05-29T23:54:39+00:00

I am running CUFFT on chunks (N*N/p) divided in multiple GPUs, and I have

  • 0

I am running CUFFT on chunks (N*N/p) divided in multiple GPUs, and I have a question regarding calculating the performance. First, a bit about how I am doing it:

  1. Send N*N/p chunks to each GPU
  2. Batched 1-D FFT for each row in p GPUs
  3. Get N*N/p chunks back to host – perform transpose on the entire dataset
  4. Ditto Step 1
  5. Ditto Step 2

Gflops = ( 1e-9 * 5 * N * N *lg(N*N) ) / execution time

and Execution time is calculated as:

execution time = Sum(memcpyHtoD + kernel + memcpyDtoH times for row and col FFT for each GPU)

Is this the correct way to evaluate CUFFT performance on multiple GPUs? Is there any other way I could represent the performance of FFT?

Thanks.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-29T23:54:42+00:00Added an answer on May 29, 2026 at 11:54 pm

    If you are doing a complex transform, the operation count is correct (it should be 2.5 N log2(N) for a real valued transform), but the GFLOP formula is incorrect. In a parallel, multiprocessor operation the usual calculation of throughput is

    operation count / wall clock time
    

    In your case, presuming the GPUs are operating in parallel, either measure the wall clock time (ie. how long the whole operation took) for the execution time, or use this:

    execution time = max(memcpyHtoD + kernel + memcpyDtoH times for row and col FFT for each GPU)
    

    As it stands, your calculation represents the serial execution time. Allowing for the overheads from the multigpu scheme, I would expect that the calculated performance numbers you are getting will be lower than the equivalent transform done on a single GPU.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

Running Python 3.2 on Windows 7 Pro 64 Bit. OK I have some very
Running a rails site right now using SQLite3. About once every 500 requests or
Running ipconfig /all shows a Teredo Tunneling Pseudo-Interface. What is that? Does this have
Running into a problem. I have a table defined to hold the values of
Running a Play! app with Scala. I'm doing a request where the response is
Running IIS 6 with a main website using .net 4. I have many sub
Running into the strangest problem on iPhone using jQuery with my WebViewController. I have
Running tomcat, i have this folder hierarchy: - manager/ - ROOT/ - doc -
Running MS SqlServer from ASP.NET I have a visitors table with no unique key.
Running ActiveMQ 5.4.0. I have a group of users that subscribe and publish to

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.