I’m using a HD5770, which has 10 Compute units and 32k of local memory.

Question

0

Editorial Team

Asked: June 18, 20262026-06-18T23:46:36+00:00 2026-06-18T23:46:36+00:00

I’m using a HD5770, which has 10 Compute units and 32k of local memory.

0

I’m using a HD5770, which has 10 Compute units and 32k of local memory.

My global size is 256 * 256,
my local size is 256

Each of the workgroups needs to use 1k of local memory which I am specifying like this:

clSetKernelArg(predicate, param++, 1024, NULL);

First: is this the correct way to allocate local memory, or do I have to specify the whole size of the buffer used by all workgroups together when setting the kernel arg and later on index into this buffer depending on the local id?

Second: Will one workgroup execute on only one compute unit?

Third: Will the memory be freed after the workgroup finished? (32k wont be enough for 256 workgroups if each of them uses 1k)

Or in a more general way: will the scheduler take care of not scheduling more than 32 Workgroups in parallel?

Thank you!

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-18T23:46:37+00:00

1) This is how you would allocate 1024 bytes of uninitialized local memory, so if that’s what you want then yes, you are doing it right. You can also define the memory inside the kernel like this:

__local float localBuffer[1024];

2) This is implementation defined so there’s no way to know and you can’t assume that. Generally a workgroup is executed on more than 1 compute unit but like I said, you never really know.

3) The memory will be overwritten the next time it is used so you shouldn’t have to worry about freeing it, especially since it is uninitialized and you aren’t passing in some buffer.

Hope this helps

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I’m using a HD5770, which has 10 Compute units and 32k of local memory.

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply