I am calling CUDA kernels from matlab. I was previously told that ( David

Question

0

Asked: June 10, 20262026-06-10T05:31:22+00:00 2026-06-10T05:31:22+00:00

I am calling CUDA kernels from matlab. I was previously told that ( David

0

I am calling CUDA kernels from matlab.

I was previously told that ( David Kirk’s book) one could only take 16kb of shared memory per thread, but I am able to consume far more than that:

__global__ void plain(float* arg)
{

    __shared__ float array[12000];
    int k;

    for (k=1;k<12000; k++)
    {
        array[k]=1;
    }   
}

CUDA C reports that a float is 4 bytes, meaning that total array size is 48Kb which is greater than 12Kb. It runs fine, so how can this be?

I am also told in
GPU shared memory size is very small – what can I do about it?
that the max shared mem per block is important. Max shared memory per block for my card is
49152 bytes, yet I am able to run the above code with 1000 threads per block.

it seems like it would use 49Kb per block, which can’t be right. Is it that the SM only services one block at once and in dong preserves the condition that there can only be 49Kb per thread block?

How is 49Kb shared mem per block reconciled with 16Kb shared memory per thread?

Thanks

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-10T05:31:23+00:00

Shared memory is allocated per thread block, with as much as 48 KB available per SM with compute capability 2.0 and up. So on a given SM you could be running a single thread block that consumes the entire 48 KB or, say, three thread blocks each of which allocates 16 KB. The limit of 16 KB of shared memory per SM applies to compute capabilities < 2.0. As opposed to shared memory, which is allocated per thread block, local memory (“local” meaning “thread local”) is allocated per thread.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I am calling CUDA kernels from matlab. I was previously told that ( David

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply