I made a simple CUDA program for practice. It simply copies over data from one array to another:
import pycuda.driver as cuda
import pycuda.autoinit
import numpy as np
from pycuda.compiler import SourceModule
# Global constants
N = 2**20 # size of array a
a = np.linspace(0, 1, N)
e = np.empty_like(a)
block_size_x = 512
# Instantiate block and grid sizes.
block_size = (block_size_x, 1, 1)
grid_size = (N / block_size_x, 1)
# Create the CUDA kernel, and run it.
mod = SourceModule("""
__global__ void D2x_kernel(double* a, double* e, int N) {
int tid = blockDim.x * blockIdx.x + threadIdx.x;
if (tid > 0 && tid < N - 1) {
e[tid] = a[tid];
}
}
""")
func = mod.get_function('D2x_kernel')
func(a, cuda.InOut(e), np.int32(N), block=block_size, grid=grid_size)
print str(e)
However, I get this error: pycuda._driver.LogicError: cuLaunchKernel failed: invalid value
When I get rid of the second argument double* e in my kernel function and invoke the kernel without the argument e, the error goes away. Why is that? What does this error mean?
Your
aarray does not exist in device memory, so I suspect that PyCUDA is ignoring (or otherwise handling) the first argument to your kernel invocation and only passing ineandN…so you get an error because the kernel was expecting three arguments and it has only received two. Removingdouble* efrom your kernel definition might eliminate the error message you’re getting, but your kernel still won’t work properly.A quick fix to this should be to wrap
ain acuda.In()call, which instructs PyCUDA to copyato the device before launching the kernel. That is, your kernel launch line should be:Edit: Also, do you realize that your kernel is not copying the first and last elements of
atoe? Yourif (tid > 0 && tid < N - 1)statement is preventing that. For the entire array, it should beif (tid < N).