In CUDA, there is a concept of a warp , which is defined as

Question

0

Asked: May 24, 20262026-05-24T17:30:48+00:00 2026-05-24T17:30:48+00:00

In CUDA, there is a concept of a warp , which is defined as

0

In CUDA, there is a concept of a warp, which is defined as the maximum number of threads that can execute the same instruction simultaneously within a single processing element. For NVIDIA, this warp size is 32 for all of their cards currently on the market.

In ATI cards, there is a similar concept, but the terminology in this context is wavefront. After some hunting around, I found out that the ATI card I have has a wavefront size of 64.

My question is, what can I do to query for this SIMD width at runtime for OpenCL?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-24T17:30:49+00:00

I found the answer I was looking for. It turns out that you don’t query the device for this information, you query the kernel object (in OpenCL). My source is:

http://www.hpc.lsu.edu/training/tutorials/sc10/tutorials/SC10Tutorials/docs/M13/M13.pdf

(Page 108)

which says:

The most efficient work group sizes are likely to be multiples of the native hardware execution width

wavefront size in AMD speak/warp size in Nvidia speak

Query device for CL_KERNEL_PREFERRED_WORK_GROUP_SIZE_MULTIPLE

So, in short, the answer appears to be to call the clGetKernelWorkGroupInfo() method with a param name of CL_KERNEL_PREFERRED_WORK_GROUP_SIZE_MULTIPLE. See this link for more information on this method:

http://www.khronos.org/registry/cl/sdk/1.1/docs/man/xhtml/clGetKernelWorkGroupInfo.html

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

In CUDA, there is a concept of a warp , which is defined as

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply