I’m using PTX from matlab to call CUDA kernels, when testing the code on

Question

0

Asked: June 18, 20262026-06-18T04:16:19+00:00 2026-06-18T04:16:19+00:00

I’m using PTX from matlab to call CUDA kernels, when testing the code on

0

I’m using PTX from matlab to call CUDA kernels, when testing the code on VS 2010 like this:

int TPB = 256; 
int BPG = (Nx + TPB -1 ) / TPB;
dim3 blk(TPB,TPB,1);
dim3 grid(BPG ,BPG,1);
grad<<< grid,blk>>>(dev_y,dev_x,dev_b,dev_t,Nx,Ny);

trying to use the same configuration in matlab

TPB = 16; 
BPG = floor((Nx + TPB -1 ) / TPB);
grad = parallel.gpu.CUDAKernel('reg.ptx','reg.cu','grad');
grad.ThreadBlockSize=[TPB TPB 1];
grad.GridSize = [BPG BPG];

knowning it’s more than 512 thread per block which is the allowed number for my TESLA C1060, and I was right

Invalid size input to kernel ThreadBlockSize. You must provide a vector of up to 3 positive   integers whose product is <= 512. The maximum value in each dimension is: [512,512,64].

any explanation why it’s run correctly on VS 2010 without error like what happened in MATLAB?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-18T04:16:20+00:00

Editorial Team

2026-06-18T04:16:20+00:00Added an answer on June 18, 2026 at 4:16 am

The C++ code segment is not checking for errors after grad<<<>>>. The MATLAB wrapper has additional error checking. The launch configuration is out of bounds. Calling cudaGetLastError after the <<<>>> will report the launch configuration error.

0

Reply
Share
Share

- Report

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I’m using PTX from matlab to call CUDA kernels, when testing the code on

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply