this is the output of ginfo using Jacket/matlab:
Detected CUDA-capable GPUs:
CUDA driver 270.81, CUDA toolkit 4.0
GPU0 Tesla C1060, 4096 MB, Compute 1.3 (single,double) (in use)
GPU1 Tesla C1060, 4096 MB, Compute 1.3 (single,double)
GPU2 Quadro FX 1800, 742 MB, Compute 1.1 (single)
Display Device: GPU2 Quadro FX 1800
The problem is :
- Can I use the two Teslas at same time (parfor)? How?
- How to know number of cores are currently running/executing the program?
- After running the following code and make Quadro (in use) I found it takes less time than Tesla despite Tesla having 240 cores and Quadro has only 64? Maybe because it’s the display device?maybe becouse it’s single precision and Tesla is Double precision?
clc; clear all;close all;
addpath ('C:/Program Files/AccelerEyes/Jacket/engine');
i = im2double(imread('cameraman.tif'));
i_gpu=gdouble(i);
h=fspecial('motion',50,45);% Create predefined 2-D filter
h_gpu=gdouble(h);
tic;
for j=1:500
x_gpu = imfilter( i_gpu,h_gpu );
end
i2 = double(x_gpu); %memory transfer
t=toc
figure(2), imshow(i2);
Any help with the code will be appreciated. As you can see it’s very trivial example used to demonstrate power of GPU, no more.
Using two Teslas at the same time: write a MEX file and call
cudaChooseDevice(0), launch one kernel, then callcudaChooseDevice(1)and execute another kernel. Kernel calls and memory copies (i.e.,cudaMemcpyAsyncandcudaMemcpyPeerAsync) are asynchronous. I’ve given an example about how to write a MEX file (i.e., a DLL) in one of my other answers. Just add a second kernel to that example. FYI, you don’t need Jacket if you can do some C/C++ programming. On the other hand, if you don’t want to spend your time learning the Cuda SDK or you don’t have a C/C++ compiler then you’re stuck with Jacket or gp-you or GPUlib until Matlab changes the way thatparforworks.An alternative is to call OpenCL from Matlab (again through a MEX file). Then you could launch kernels on all the GPUs and CPUs. Again, this requires some C/C++ programming.