Hey
I’ve seen on a website this example kernel
__global__ void loop1( int N, float alpha, float* x, float* y ) {
int i;
int i0 = blockIdx.x*blockDim.x + threadIdx.x;
for(i=i0;i<N;i+=blockDim.x*gridDim.x) {
y[i] = alpha*x[i] + y[i];
}
}
To compute this function in C
for(i=0;i<N;i++) {
y[i] = alpha*x[i] + y[i];
}
Surely the for loop inside the kernel isn’t necessary? and you can just do y[i0] = alpha*x[i0] + y[i0] and remove the for loop altogether.
I’m just curious as to why it’s there and what it’s purpose is. This is assuming a kernel call such as loop1<<<64,256>>>> so presumably gridDim.x = 1
You need the for loop in the kernel if your vector has more entrys than you have started threads. If it’s possible it is of course more efficent to start enough threads.