In CUDA, there is a concept of a warp, which is defined as the maximum number of threads that can execute the same instruction simultaneously within a single processing element. For NVIDIA, this warp size is 32 for all of their cards currently on the market.
In ATI cards, there is a similar concept, but the terminology in this context is wavefront. After some hunting around, I found out that the ATI card I have has a wavefront size of 64.
My question is, what can I do to query for this SIMD width at runtime for OpenCL?
I found the answer I was looking for. It turns out that you don’t query the device for this information, you query the kernel object (in OpenCL). My source is:
http://www.hpc.lsu.edu/training/tutorials/sc10/tutorials/SC10Tutorials/docs/M13/M13.pdf
(Page 108)
which says:
So, in short, the answer appears to be to call the clGetKernelWorkGroupInfo() method with a param name of CL_KERNEL_PREFERRED_WORK_GROUP_SIZE_MULTIPLE. See this link for more information on this method:
http://www.khronos.org/registry/cl/sdk/1.1/docs/man/xhtml/clGetKernelWorkGroupInfo.html