Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 7770709
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 1, 20262026-06-01T16:28:27+00:00 2026-06-01T16:28:27+00:00

This question is also started from following link: shared memory optimization confusion In above

  • 0

This question is also started from following link: shared memory optimization confusion

In above link, from talonmies’s answer, I found that the first condition of the number of blocks which will be scheduled to run is “8”. I have 3 questions as shown in below.

  1. Does it mean that only 8 blocks can be scheduled at the same time when the number of blocks from condition 2 and 3 is over 8? Is it regardless of any condition such as cuda environment, gpu device, or algorithm?

  2. If so, it really means that it is better not to use shared memory in some cases, it depends. Then we have to think how can we judge which one is better, using or not using shared memory. I think one approach is checking whether there is global memory access limitation (memory bandwidth bottleneck) or not. It means we can select “not using shared memory” if there is no global memory access limitation. Is it good approach?

  3. Plus above question 2, I think if the data that my CUDA program should handle is huge, then we can think “not using shared memory” is better because it is hard to handle within the shared memory. Is it also good approach?

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-01T16:28:29+00:00Added an answer on June 1, 2026 at 4:28 pm

    The number of concurrently scheduled blocks are always going to be limited by something.

    Playing with the CUDA Occupancy Calculator should make it clear how it works. The usage of three types of resources affect the number of concurrently scheduled blocks. They are, Threads Per Block, Registers Per Thread and Shared Memory Per Block.

    If you set up a kernel that uses 1 Threads Per Block, 1 Registers Per Thread and 1 Shared Memory Per Block on Compute Capability 2.0, you are limited by Max Blocks per Multiprocessor, which is 8. If you start increasing Shared Memory Per Block, the Max Blocks per Multiprocessor will continue to be your limiting factor until you reach a threshold at which Shared Memory Per Block becomes the limiting factor. Since there are 49152 bytes of shared memory per SM, that happens at around 8 / 49152 = 6144 bytes (It’s a bit less because some shared memory is used by the system and it’s allocated in chunks of 128 bytes).

    In other words, given the limit of 8 Max Blocks per Multiprocessor, using shared memory is completely free (as it relates to the number of concurrently running blocks), as long as you stay below the threshold at which Shared Memory Per Block becomes the limiting factor.

    The same goes for register usage.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I was reading over this question and wondered if the accepted answer might also
Also can you please answer this question? how do I get co-ordinates of selected
( I don't know whether should I also post this question to ServerFault, since
EDIT: There's now a doc page on this so this question is irrelevant, also
I know abstract fields do not exist in java. I also read this question
This is a problem am facing for long now also asked a question regarding
This may seem an easy question, but not to me, also a search has
I realize this is more of a hardware question, but this is also very
TLDR: Started with this question simplified it after got some of it working and
I have just started using RadControls so this question might be basic for you,

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.