I’m having problems grasping why my function that finds maximum and minimum in a range of doubles using CUBLAS doesn’t work properly.
The code is as follows:
void findMaxAndMinGPU(double* values, int* max_idx, int* min_idx, int n)
{
double* d_values;
cublasHandle_t handle;
cublasStatus_t stat;
safecall( cudaMalloc((void**) &d_values, sizeof(double) * n), "cudaMalloc (d_values) in findMaxAndMinGPU");
safecall( cudaMemcpy(d_values, values, sizeof(double) * n, cudaMemcpyHostToDevice), "cudaMemcpy (h_values > d_values) in findMaxAndMinGPU");
cublasCreate(&handle);
stat = cublasIdamax(handle, n, d_values, sizeof(double), max_idx);
if (stat != CUBLAS_STATUS_SUCCESS)
printf("Max failed\n");
stat = cublasIdamin(handle, n, d_values, sizeof(double), min_idx);
if (stat != CUBLAS_STATUS_SUCCESS)
printf("min failed\n");
cudaFree(d_values);
cublasDestroy(handle);
}
Where values is the values to search within. The max_idx and min_idx are the index of the found numbers in values.
The results from the CUBLAS-calls seems rather random and output wrong indexes.
Anyone with a golly good answer to my problem? I am a tad sad at the moment 🙁
One of your arguments to both the
cublasIdamaxandcublasIdamincalls are wrong. Theincxargument in BLAS level 1 calls should always be the stride of the input in words, not bytes. So I suspect that you want something more like:By using
sizeof(double)you are telling the routines to use a stride of 8, which will have the calls overrun the allocated storage of the input array and into uninitialised memory. I presume you actually have a stride of 1 ind_values.Edit: Here is a complete runnable example which works correctly. Note I switched the code to single precision because I don’t presently have access to double precision capable hardware:
which when compiled and run gives this:
note that CUBLAS follows the FORTRAN convention and uses 1 indexing, rather than zero indexing, which is why there is a difference of 1 between the CUBLAS and CPU versions.