Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 3276830
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 17, 20262026-05-17T19:18:08+00:00 2026-05-17T19:18:08+00:00

Searching through the NVIDIA forums I found these questions , which are also of

  • 0

Searching through the NVIDIA forums I found these questions, which are also of interest to me, but nobody had answered them in the last four days or so. Can you help?

Original Forum Post

Digging into OpenCL reading tutorials some things stayed unclear for me. Here is a collection of my questions regarding local and global work sizes.

  1. Must the global_work_size be smaller than CL_DEVICE_MAX_WORK_ITEM_SIZES?
    On my machine CL_DEVICE_MAX_WORK_ITEM_SIZES = 512, 512, 64.

  2. Is CL_KERNEL_WORK_GROUP_SIZE the recommended work_group_size for the used kernel?

    1. Or is this the only work_group_size the GPU allows?
      On my machine CL_KERNEL_WORK_GROUP_SIZE = 512
  3. Do I need to divide into work groups or can I have only one, but not specifying local_work_size?

    1. To what do I have to pay attention, when I only have one work group?
  4. What does CL_DEVICE_MAX_WORK_GROUP_SIZE mean?
    On my machine CL_DEVICE_MAX_WORK_GROUP_SIZE = 512, 512, 64

    1. Does this mean, I can have one work group which is as large as the CL_DEVICE_MAX_WORK_ITEM_SIZES?
  5. Has global_work_size to be a divisor of CL_DEVICE_MAX_WORK_ITEM_SIZES?
    In my code global_work_size = 20.

  • 1 1 Answer
  • 1 View
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-17T19:18:08+00:00Added an answer on May 17, 2026 at 7:18 pm

    In general you can choose global_work_size as big as you want, while local_work_size is constraint by the underlying device/hardware, so all query results will tell you the possible dimensions for local_work_size instead of the global_work_size. the only constraint for the global_work_size is that it must be a multiple of the local_work_size (for each dimension).

    The work group sizes specify the sizes of the workgroups so if CL_DEVICE_MAX_WORK_ITEM_SIZES is 512, 512, 64 that means your local_work_size can’t be bigger then 512 for the x and y dimension and 64 for the z dimension.

    However there is also a constraint on the local group size depending on the kernel. This is expressed through CL_KERNEL_WORK_GROUP_SIZE. Your cumulative workgoupsize (as in the product of all dimensions, e.g. 256 if you have a localsize of 16, 16, 1) must not be greater then that number. This is due to the limited hardware resources to be divided between the threads (from your query results I assume you are programming on a NVIDIA GPU, so the amount of local memory and registers used by a thread will limit the number of threads which can be executed in parallel).

    CL_DEVICE_MAX_WORK_GROUP_SIZE defines the maximum size of a work group in the same manner as CL_KERNEL_WORK_GROUP_SIZE, but specific to the device instead the kernel (and it should be a a scalar value aka 512).

    You can choose not to specify local_work_group_size, in which case the OpenCL implementation will choose a local work group size for you (so its not a guarantee that it uses only one workgroup). However it’s generally not advisiable, since you don’t know how your work is divided into workgroups and furthermore it’s not guaranteed that the workgroupsize chosen will be optimal.

    However, you should note that using only one workgroup is generally not a good idea performancewise (and why use OpenCL if performance is not a concern). In general a workgroup has to execute on one compute unit, while most devices will have more then one (modern CPUs have 2 or more, one for each core, while modern GPUs can have 20 or more). Furthermore even the one Compute Unit on which your workgroup executes might not be fully used, since several workgroup can execute on one compute unit in an SMT style. To use NVIDIA GPUs optimally you need 768/1024/1536 threads (depending on the generation, meaning G80/GT200/GF100) executing on one compute unit, and while I don’t know the numbers for amd right now, they are in the same magnitude, so it’s good to have more then one workgroup. Furthermore, for GPUs, it’s typically advisable to have workgroups which at least 64 threads (and a number of threads divisible by 32/64 (nvidia/amd) per workgroup), because otherwise you will again have reduced performance (32/64 is the minimum granuaty for execution on gpus, so if you have less items in a workgroup, it will still execute as 32/64 threads, but discard the results from unused threads).

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

you would think this would be obvious, but searching through documentation, SAP forums, Googling,
I have been searching through a lot of questions but they don't ask or
I am searching through my DisplayContainer for text fields to change them but i
I've tried searching through search engines,MSDN,etc. but can't anything. Sorry if this has been
I've been searching through forums to find a solution for the problem I'm facing
Each of of these threads is searching through a different list of data objects
I went through lots of searching the web which shows how to draw custom
I've been searching through the web but I have not been able to find
Im trying to build a kd-tree for searching through a set of points, but
I have been searching through SO for the last few days and have found

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.