Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • Home
  • SEARCH
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 7895519
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 3, 20262026-06-03T07:39:47+00:00 2026-06-03T07:39:47+00:00

Sorry if this is obvious, but I’m studying c++ and Cuda right now and

  • 0

Sorry if this is obvious, but I’m studying c++ and Cuda right now and wanted to know if this was possible so I could focus more on the relevant sections.

Basically my problem is highly parallelizable, in fact I’m running it on multiple servers currently. My program gets a work item(very small list) and runs a loop on it and makes one of 3 decisions:

  1. keep the data(saves it),
  2. Discard the data(doesn’t do anything with it),
  3. Process data further(its unsure of what to do so it modifies the data and resends it to the queue to process.

This used to be a recursion but I made each part independent and although I’m longer bound by one cpu but the negative effect of it is there’s alot of messages that pass back/forth. I understand at a high level how CUDA works and how to submit work to it but is it possible for CUDA to manage the queue on the device itself?

My current thought process was manage the queue on the c++ host and then send the processing to the device, after which the results are returned back to the host and sent back to the device(and so on). I think that could work but I wanted to see if it was possible to have the queue on the CUDA memory itself and kernels take work and send work directly to it.

Is something like this possible with CUDA or is there a better way to do this?

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-03T07:39:49+00:00Added an answer on June 3, 2026 at 7:39 am

    I think what you’re asking is if you can keep intermediate results on the device. The answer to that is yes. In other words, you should only need to copy new work items to the device and only copy finished items from the device. The work items that are still undetermined can stay on the device between kernel calls.

    You may want to look into CUDA Thrust for this. Thrust has efficient algorithms for transformations, which can be combined with custom logic (search for “kernel fusion” in the Thrust manual.) It sounds like maybe your processing can be considered to be transformations, where you take a vector of work items and create two new vectors, one of items to keep and one of items that are still undetermined.

    Is the host aware(or can it monitor) memory on device? My concern is how to be aware and deal with data that starts to exceed GPU onboard memory.

    It is possible to allocate and free memory from within a kernel but it’s probably not going to be very efficient. Instead, manage memory by running CUDA calls such as cudaMalloc() and cudaFree() or, if you’re using Thrust, creating or resizing vectors between kernel calls.

    With this “manual” memory management you can keep track of how much memory you have used with cudaMemGetInfo().

    Since you will be copying completed work items back to the host, you will know how many work items are left on the device and thus, what the maximum amount of memory that might be required in a kernel call is.

    Maybe a good strategy will be to swap source and destination vectors for each transform. To take a simple example, say you have a set of work items that you want to filter in multiple steps. You create vector A and fill it with work items. Then you create vector B of the same size and leave it empty. After the filtering, some portion of the work items in A have been moved to B, and you have the count. Now you run the filter again, this time with B as the source and A as the destination.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

Sorry if this is obvious but have looked around and can't get this working:
Im new to perl, so sorry if this is obvious, but i looked up
sorry if the answer to this is obvious but I couldn't find it. How
Sorry if this is blatantly obvious, but I've Googled this and I seriously cannot
Sorry this may be blatantly obvious but I have spent all morning trying to
I'm sorry if this is too obvious but I can't find any proper answer
Sorry if this is obvious, but I'm struggling with how to extract components from
Sorry if this is a question with a obvious answer but my knowledge with
Sorry if this is an obvious question, but I've found surprisingly few references on
Sorry if this is too obvious, but I am a total newcomer to lua,

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.