I have following kernel code for matrix manipulation. Matrix A = 1*3 and Matrix B = 3*3 resultant Matrix C would be 1*3. In the following code the width would be 3.
__global__void MatrixMulKernel(float* d_M,float* d_N,float* d_P,int Width) {
int row = blockIdx.y * blockDim.y + threadIdx.y;
int col = blockIdx.x * blockDim.x + threadIdx.x;
if(row>=Width || col>=Width){ // matrix range
return;
}
float P_val = 0.0f;
for (int k = 0; k < Width; ++k) {
float M_elem = d_M[row * Width + k];
float N_elem = d_N[k * Width + col];
P_val += M_elem * N_elem;
}
d_p[row*Width+col] = P_val;
}
I kernel code is called as follows
int block_size = 32;
dim3 dimGrid(Width/block_size, Width/block_size);
dim3 dimBlock(block_size, block size);
MatrixMulKernel<<<dimGrid, dimBlock>>>(d_M, d_N, d_P,3);
But I am getting wrong results. I am getting results as zero always.
Can anyone help me please.
The code looks likes its for multiplication of 2 square matrices of same size.
Width is the number of columns of the first matrix.
You have to provide this as an argument to the function.