Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 9181671
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 17, 20262026-06-17T18:22:24+00:00 2026-06-17T18:22:24+00:00

I am still a little unsure when it comes to shared/local memory in CUDA.

  • 0

I am still a little unsure when it comes to shared/local memory in CUDA. Currently I have a kernel, within the kernel each thread allocates a list object. Something like this

__global__ void TestDynamicListPerThread()
{
    //Creates a dynamic list (Each thread gets its own list)
    DynamicList<int>  dlist(15);

    //Display some ouput information
    printf("Allocated a new DynamicList, size=%d, got pointer %p\n", dlist.GetSizeInBytes(),dlist.GetDataPtr());

    //Loops through and inserts multiples of four into the list
    for (int i = 0; i < 12; i++)
        dlist.InsertAtEnd(i*4);
}

By my current understanding each thread gets its own dlist stored in local memory, is this true?
If that is the case, would there be any way at the end of the kernels execution to grab each of the dlist objects (from another kernel), or should I be using a __shared__ array of dynamic lists allocated by the first thread?

I think I may be over-complicating things a little, but I never need to change the lists per say, the execution I am trying to achieve goes something like this

  1. Create lists (Done on the GPU only)
  2. Produce output from each list (Done on the GPU, by each thread, needs only the information from the list allocated for that thread.)
  3. Modify/Swap lists (Still done on the GPU)
  4. Repeat 2 and 3 until some break condition is met on the host

Thanks in advance!

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-17T18:22:26+00:00Added an answer on June 17, 2026 at 6:22 pm

    By my current understanding each thread gets its own dlist stored in local memory, is this true?

    That is correct. Local variables are created per thread. They will be stored either in a register or in a local memory, where the variable ends depends mostly on the compiler.

    If that is the case, would there be any way at the end of the kernels execution to grab each of the dlist objects (from another kernel), or should I be using a __shared__ array of dynamic lists allocated by the first thread?

    Local memory is private to the thread (an exception: starting with compute capability 3.0 there are some intrawarp instruction that can facilitate exchange of thread-local variables between the threads within a warp) so you would need to copy the local variable to some global memory variable if you need to get it’s value outside the kernel.
    __shared__ memory is allocated per threadblock and is only accessible within that threadblock so again you would need to copy the value to a global memory location.

    What you probably need is something like a global array of lists that you pass around to your kernels as a parameter.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I have done some digging out here and am still a little unsure how
Okay I have updated my code a little, but I am still not exactly
I have been browsing this site for the answer but I'm still a little
New to Git and still a little perplexed. I have forked a project on
Still a little confused about Objective-C memory management. I think my confusion stems from
I'm currently working on a GWT application and I'm still a little fuzzy on
Being still a little unfamiliar with Spring, I have encountered a problem that makes
Since this is just my second game, I'm still a little unsure of how
I'm still a little shaky on using ByteBuffer 's. What I want to do
I am still a little bit new to C# and I was wondering how

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.