Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 8290093
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 8, 20262026-06-08T12:41:48+00:00 2026-06-08T12:41:48+00:00

extern "C" void callKernel() { for(int i=0;i<10;i++) { calc<<< grid, thread >>>(d_arr); copyElement<<< grid,

  • 0
extern "C" void callKernel()
{
    for(int i=0;i<10;i++)
    {
        calc<<< grid, thread >>>(d_arr);
        copyElement<<< grid, thread >>>(d_arr,d_arr_part,3);
        findMax<<< grid, thread >>>(d_arr_part, d_max);
        positionChange<<< grid, thread >>>(d_arr, d_max);
    }
}

Above code is about computing kernels.

The functionality of kernel function is like this.

"calc" : calculate in d_arr and update the d_arr’s elements value.

"copyElement" : for example, d_arr is 4step array, In the array, I just want 3rd element, so I allocate other variable d_arr_part and copy to 3rd element of d_arr to d_arr_part.

"findMax" : find max value in d_arr_part and the max value is stored to d_max.

"positionChange" : d_arr element is update according to d_max value.

Problem

When I execute my program, results have no consistency. Whenever I execute, results are changed. I search this problem in google and find out that kernel function is executed concurrently. My intension is all kernel function is executed in sequence. I read NVIDIA’s CUDA C programming guide at section 3.2.5. But I can’t understand what to do to solve the problem. If anybody have an idea, please show me the way. Thanks in advance.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-08T12:41:50+00:00Added an answer on June 8, 2026 at 12:41 pm

    You can use cudaDeviceSynchronize in between kernel executions to guarantee a sequential order. However, your code does not require this, so I think there might be a bug in your kernels.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

Any idea why this code: extern C __declspec(dllexport) void Transform(double x[], double y[], int
I have a dll that exports extern C __declspec(dllexport) int __stdcall Foo( void );
extern int ether_hostton (__const char *__hostname, struct ether_addr *__addr) __THROW; I found the above
Consider this C code: extern volatile int hardware_reg; void f(const void *src, size_t len)
main.h extern int array[100]; main.c #include main.h int array[100] = {0}; int main(void) {
I have one function extern C int ping(void) in a C++ static-library project. Now,
Why does gcc allow extern declarations of type void? Is this an extension or
I have the function prototype here: extern C void __stdcall__declspec(dllexport) ReturnPulse(double*,double*,double*,double*,double*); I need to
[DllImport(user32.dll)] public static extern int SetScrollPos(IntPtr hWnd, int nBar, int nPos, bool bRedraw); [DllImport(user32.dll,
Here is the code in C++ dll: extern C _declspec(dllexport) int testDelegate(int (*addFunction)(int, int),

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.