I want to partition a large data set and split the work on multiple

Question

0

Asked: June 16, 20262026-06-16T04:38:07+00:00 2026-06-16T04:38:07+00:00

I want to partition a large data set and split the work on multiple

0

I want to partition a large data set and split the work on multiple GPUs. I want to make these data static so that I don’t have to load to GPU for the second run. Now the problem is that, pthread_create requires all input data be assembled into a “struct”, and I am not sure whether assembling a bunch of static data into a struct will work. Thanks for any suggestions.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-16T04:38:09+00:00

In “modern” CUDA multi-gpu, it is no longer necessary to use a different host thread to hold a context on a given device. Since CUDA 4.0, the API is thread safe, and one host thread can hold and work with multiple contexts simply using cudaSetDevice.

A really, really basic example of how to distribute a large dataset over multiple GPUs in CUDA 4.x or CUDA 5 could be as simple as:

int remainder = N;
int* plens = new int[ngpus];
float** pvals = new float*[ngpus];
float* source = &host_array[0];
for(int i=0; i<ngpus; i++) {
    const int blen = N/gpus;
    plens[i] = blen;
    remainder -= blen;
    if (remainder < blen) {
        plens[i] += remaninder;
        remainder = 0;
    }
    size_t sz = sizeof(float) * size_t(plens[i]);
    cudaSetDevice(i);
    cudaMalloc((void **)&pvals[i], sz);
    cudaMemcpy(pvals[i], source, sz, cudaMemcpyHostToDevice);
    source += plens[i];
}

[disclaimer: written in browser, never compiled or tested, use a own risk]

assuming that the GPUs are sequentially numbered from [0,ngpus-1] and the source data is held in the floating point array host_array of length N. You get back an array of device pointers in pvals and the length of each array in plens. Note that each pointer is only valid in the context in which you allocated it, so make sure you select the device before using the pointer with a kernel launch or API call.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I want to partition a large data set and split the work on multiple

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply