I have a NVIDIA GTX 570 compute capability 2.0 running cuda-4.0. The deviceQuery executable

Question

0

Asked: May 26, 20262026-05-26T14:32:32+00:00 2026-05-26T14:32:32+00:00

I have a NVIDIA GTX 570 compute capability 2.0 running cuda-4.0. The deviceQuery executable

0

I have a NVIDIA GTX 570 compute capability 2.0 running cuda-4.0.

The deviceQuery executable in the CUDA SDK gives me information on my CUDA device and its various properties. Two of the lines in the output are

Maximum number of threads per block: 1024

Maximum sizes of each dimension of a block: 1024 x 1024 x 64

Why is the 3rd dimension of the block restricted to be upto 64 threads only wheras the X and the Y dimension can vary upto 1024 threads?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-26T14:32:33+00:00

EDIT2: ALso, please take this with a grain of salt; this is a purely hypothetical answer, or a guess. There may indeed be a clear hardware-based reason why 64 is the maximum. Frankly I don’t know, and my answer is based on an assumption that there is no such hardware limit, per se.

It’s probably a combination of three things: first, there is a limit to the number of threads which can be resident inside a block; second, block dimensions are typically in multiples of 32, and even more often in powers of 2 greater than 32; third, coordinate systems used in the solution of multi-dimensional problems are most often oriented so that you’re looking at the scene directly (i.e., with the important bits more distributed in X and Y than in Z).

CUDA naturally has to support 1D access, as this is an immensely common and efficient access pattern when it is applicable. TO support this, the X dimension must be allowed to vary over the entire range of 1024 threads.

To support 2D access, which is less common, CUDA should minimally support up to 512 in the X dimension (using the convention that the X dimension should be oriented in the coordinate system so that it measures the biggest spread) and 32 in the Y dimension. It must support up to 1024 in the X dimension, and I suppose they relax the requirement that the X dimension be no smaller than the Y dimension and allow the full 1024 range of Y values. However, in my understanding, 32 would have been plenty big for the Y dimension maximum.

To support 3D access, maintaining X, Y >= Z and trying to reach 1024, it seems to be that in the best case X=Y=Z=10; so there’s no real argument for allowing Z to be greater than 10, given my assumptions

In summary, I don’t see why they couldn’t have made the maximums (1024, 32, 10). My question is why make them (1024, 1024, 64)? The only answer I keep coming back to is to allow some flexibility to programmers to violate the X>=Y>=Z coordinate system convention.

Edit: given my summary and hypothetical answer, the real answer to your question is this: it’s an arbitary decision.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I have a NVIDIA GTX 570 compute capability 2.0 running cuda-4.0. The deviceQuery executable

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply