Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 6910717
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 27, 20262026-05-27T08:51:01+00:00 2026-05-27T08:51:01+00:00

I am developing a CUDA ray-plane intersection kernel. Let’s suppose, my plane (face) struct

  • 0

I am developing a CUDA ray-plane intersection kernel.

Let’s suppose, my plane (face) struct is:

typedef struct _Face {
    int ID;
    int matID;

    int V1ID;
    int V2ID;
    int V3ID;

    float V1[3];
    float V2[3];
    float V3[3];

    float reflect[3];

    float emmision[3];
    float in[3];
    float out[3];

    int intersects[RAYS];

} Face;

I pasted the whole struct so you can get an idea of it’s size. RAYS equals 625 in current configuration. In the following code assume that the size of faces array is i.e. 1270 (generally – thousands).

Now until today I have launched my kernel in a very naive way:

const int tpb = 64; //threads per block
dim3 grid = (n +tpb-1)/tpb; // n - face count in array
dim3 block = tpb;
//.. some memory allocation etc.
theKernel<<<grid,block>>>(dev_ptr, n);

and inside the kernel I had a loop:

__global__ void theKernel(Face* faces, int faceCount) {
    int offset = threadIdx.x + blockIdx.x*blockDim.x;
    if(offset >= faceCount)
        return;
    Face f = faces[offset];
    //..some initialization
    int RAY = -1;
    for(float alpha=0.0f; alpha<=PI; alpha+= alpha_step ){ 
        for(float beta=0.0f; beta<=PI; beta+= beta_step ){ 
            RAY++;
            //..calculation per ray in (alpha,beta) direction ...
            faces[offset].intersects[RAY] = ...; //some assignment

This is about it. I looped through all the directions and updated the faces array. I worked correctly, but was hardly any faster than CPU code.

So today I tried to optimize the code, and launch the kernel with a much bigger number of threads. Instead of having 1 thread per face I want 1 thread per face’s ray (meaning 625 threads work for 1 face). The modifications were simple:

dim3 grid = (n*RAYS +tpb-1)/tpb;  //before launching . RAYS = 625, n = face count

and the kernel itself:

__global__ void theKernel(Face *faces, int faceCount){

int threadNum = threadIdx.x + blockIdx.x*blockDim.x;

int offset = threadNum/RAYS; //RAYS is a global #define
int rayNum = threadNum - offset*RAYS;

if(offset >= faceCount || rayNum != 0)
    return;

Face f = faces[offset];
//initialization and the rest.. again ..

And this code does not work at all. Why? Theoretically, only the 1st thread (of the 625 per Face) should work, so why does this result in bad (hardly any) computation?

Kind regards,
e.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-27T08:51:02+00:00Added an answer on May 27, 2026 at 8:51 am

    The maximum size of a grid in any dimension is 65535 (CUDA programming guide, Appendix F). If your grid size was 1000 before the change, you have increased it to 625000. That’s bigger than the limit, so the kernel won’t run correctly.

    If you define the grid size as

    dim3 grid((n + tpb - 1) / tpb, RAYS);
    

    then all grid dimensions will be smaller than the limit. You’ll also have to change the way blockIdx is used in the kernel.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I am developing a CUDA 4.0 application running on a Fermi card. According to
I'm developing an N-body algorithm in CUDA and I would like to learn some
I'm currently developing a mex-file with CUDA functionality to be used in MATLAB. When
Hey there, I'm currently developing a Mex-file in matlab including CUDA computation. I wonder
In developing a face recognition we first need to detect faces.Recent way is to
I'm developing CUDA with Eclipse Indigo on Ubuntu 11.10. To set up a new
Developing websites are time-consuming. To improve productivity, I would code a prototype to show
Developing a .NET WinForms application: how can I check if the window is in
Developing a heavily XML-based Java-application, I recently encountered an interesting problem on Ubuntu Linux.
Developing a website and just trying to get back into the swing of (clever)

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.