Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • Home
  • SEARCH
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 6763995
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 26, 20262026-05-26T14:32:32+00:00 2026-05-26T14:32:32+00:00

I have a NVIDIA GTX 570 compute capability 2.0 running cuda-4.0. The deviceQuery executable

  • 0

I have a NVIDIA GTX 570 compute capability 2.0 running cuda-4.0.

The deviceQuery executable in the CUDA SDK gives me information on my CUDA device and its various properties. Two of the lines in the output are

Maximum number of threads per block: 1024

Maximum sizes of each dimension of a block: 1024 x 1024 x 64

Why is the 3rd dimension of the block restricted to be upto 64 threads only wheras the X and the Y dimension can vary upto 1024 threads?

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-26T14:32:33+00:00Added an answer on May 26, 2026 at 2:32 pm

    EDIT2: ALso, please take this with a grain of salt; this is a purely hypothetical answer, or a guess. There may indeed be a clear hardware-based reason why 64 is the maximum. Frankly I don’t know, and my answer is based on an assumption that there is no such hardware limit, per se.

    It’s probably a combination of three things: first, there is a limit to the number of threads which can be resident inside a block; second, block dimensions are typically in multiples of 32, and even more often in powers of 2 greater than 32; third, coordinate systems used in the solution of multi-dimensional problems are most often oriented so that you’re looking at the scene directly (i.e., with the important bits more distributed in X and Y than in Z).

    CUDA naturally has to support 1D access, as this is an immensely common and efficient access pattern when it is applicable. TO support this, the X dimension must be allowed to vary over the entire range of 1024 threads.

    To support 2D access, which is less common, CUDA should minimally support up to 512 in the X dimension (using the convention that the X dimension should be oriented in the coordinate system so that it measures the biggest spread) and 32 in the Y dimension. It must support up to 1024 in the X dimension, and I suppose they relax the requirement that the X dimension be no smaller than the Y dimension and allow the full 1024 range of Y values. However, in my understanding, 32 would have been plenty big for the Y dimension maximum.

    To support 3D access, maintaining X, Y >= Z and trying to reach 1024, it seems to be that in the best case X=Y=Z=10; so there’s no real argument for allowing Z to be greater than 10, given my assumptions

    In summary, I don’t see why they couldn’t have made the maximums (1024, 32, 10). My question is why make them (1024, 1024, 64)? The only answer I keep coming back to is to allow some flexibility to programmers to violate the X>=Y>=Z coordinate system convention.

    Edit: given my summary and hypothetical answer, the real answer to your question is this: it’s an arbitary decision.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I found this code in NVIDIA's CUDA SDK samples. void computeGold( float* reference, float*
Soon enough we will have nVidia GTX 300 that would be able to execute
what is the best nvidia Video Card for cuda development. a single GTX 295
I have a problem with Nvidia's OpenCl/Cuda framework, but I think it is a
Okay i have already been through most of the ati and nvidia guides to
I am doing research on CUDA programming. i have the option to buy a
I'm running Windows 7 Pro x64 on a Core i5 with a NVIDIA 3100m,
For NVIDIA graphics cards, you can have two working as one (SLI). For a
I have been running into issues with OpenGL rendering on different computers: Works: Intel
so I've downloaded the nVidia CUDA libraries and put them in the default location:

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.