Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 7696287
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 31, 20262026-05-31T21:44:41+00:00 2026-05-31T21:44:41+00:00

I ran some CUDA code that updated an array of floats. I have a

  • 0

I ran some CUDA code that updated an array of floats. I have a wrapper function like the one discussed in How can I compile CUDA code then link it to a C++ project? this question.

Inside my CUDA function I create a for loop like this…

int tid = threadIdx.x;
for(int i=0;i<X;i++)
{
     //code here
}

Now the issue is that if X is equal to the value of 100, everything works just fine, but if X is equal to 1000000, my vector does not get updated (almost as if the code inside the for loop does not get executed)

Now inside the wrapper function, if I call the CUDA function in a for loop, it still works just fine, (but is significantly slower for some reason than if I simply did the same process all on the CPU) like this…

for(int i=0;i<1000000;i++)
{
      update<<<NumObjects,1>>>(dev_a, NumObjects);
}

Does anyone know why I can loop a million times in the wrapper function but not simply call the CUDA “update” function once and then inside that function start a for loop of a million?

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-31T21:44:42+00:00Added an answer on May 31, 2026 at 9:44 pm

    You should be using cudaThreadSynchronize and cudaGetLastError after running this to see if there was some error. I imagine the first time, it timed out. This happens if the kernel takes a long time to complete. The card just gives up on it.

    The second thing, the reason it takes much longer to execute, is because there is a set overhead time for each kernel launch. When you had the loop inside the kernel, you experienced this overhead once and ran the loop. Now you’re experiencing it X times. The overhead is fairly small, but large enough that as much of the loop should be put inside the kernel as possible.

    If X is particularly large, you might look into running as much of the loop in the kernel as possible until it completes in a safe amount of time, and then loop over these kernels.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I just ran into some code that overuse semicolons, or use semicolon for different
So I ran some static code analyzer over some c code and one thing
I ran some code through an automatic translator for C# to VB, and it
I ran across some code recently at work (recreated to be similar to what
I recently ran across some 3rd party C# code which does the following: public
I'm checking out some PHP 5.3.0 features and ran across some code on the
The code golf series seem to be fairly popular. I ran across some code
I ran some tests , and the data point that the jQuery inArray() is
Recently while trying to answer a questions here, I ran some test code to
Recently, I ran some of my JavaScript code through Crockford's JSLint , and it

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.