I’m just messing around trying to learn a little bit about parallel computing. If

Question

0

Asked: May 27, 20262026-05-27T20:55:02+00:00 2026-05-27T20:55:02+00:00

I’m just messing around trying to learn a little bit about parallel computing. If

0

I’m just messing around trying to learn a little bit about parallel computing. If have a something that looks like this,

long A[12];
long B[5,000,000];
long C[12];
long long total=0;
long long tmp; 

GPUKernel(){

    for (n=0; n < 5,000,000; ++n) {
        B[n]=0;
    }

        for (n=0; n < 5,000,000; ++n) {
             for (n2=0; n2 < 12; ++n2) {
                 B[n]+=C[A[n2]];   
             }
             tmp+=B[n];      
        }  

     if (tmp > total) {
         total=tmp;
         tmp=0;
     } 
 }



int main(){

    srand( (unsigned)time( NULL ) );   

    for (n=0; n < 12; ++n) {
        C[n]=rand() % 1000000;
    }

    for (n=0 ; n < 8916100448256 ; ++n) {    
        ++A[0];
        for (p=0; n<11; ++p) {
            if (A[p]==12) {
                A[p]=0;         
                ++A[p+1];
            } 
        }
    GPUKernel();
    }

 return 0;   
}

My idea is that I’ll get the number of threads the CPU can use. For example, if there are 4, and I’ll make separate copies of all the data for how every many cpu threads I make. So each gpu kernel will have it’s own data as well. Does this make sense? Would this be a good way of going about this task?

//cpu core 1
for (n=0; n < 8916100448256/4 ; ++n) {
    ...
GPUKernel1();
}

//cpu core 2
for (n=(8916100448256/4; n < (8916100448256/4)*2 ; ++n) {
   ...  
GPUKernel2();
}

//cpu core 3
for (n=(8916100448256/4)*2; n < (8916100448256/4)*3 ; ++n) {
   ...       
GPUKernel3();
}

//cpu core 4
for (n=(8916100448256/4)*3; n < 8916100448256) ; ++n) {
   ...      
GPUKernel4();
}

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-27T20:55:02+00:00

Editorial Team

2026-05-27T20:55:02+00:00Added an answer on May 27, 2026 at 8:55 pm

Correct me if I’m wrong, but this seems like an algorithms questions. OpenCL is nowhere in the picture. BTW, when you write kernel code in OpenCL/CUDA the data allocated to each thread will be determined by the thread ID of that thread, you can divide them in terms of blocks etc. Please refer to the Programming guide(NVIDIA/AMD).

0

Reply
Share
Share

- Report

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I’m just messing around trying to learn a little bit about parallel computing. If

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply