I need to load 128 bit data per thread in CUDA C++. That in

Question

0

Asked: June 9, 20262026-06-09T23:43:05+00:00 2026-06-09T23:43:05+00:00

I need to load 128 bit data per thread in CUDA C++. That in

0

I need to load 128 bit data per thread in CUDA C++. That in this case it is better to use for maximum performance and compatibility with the code for the CPU?
Will the following examples to access the data the equal performance?

1: Use two:

unsigned __int64 src1 = arr[threadIdx.x/2];
unsigned __int64 src2 = arr[threadIdx.x/2 + 1];

2: Use:

struct T_src { unsigned __int64 src1, src2; };
T_src src = arr[threadIdx.x];

3: Use specific types of CUDA:

ulong2 src =  arr[threadIdx.x];

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-09T23:43:07+00:00

Accessing memory in the GPU’s “native” terms using CUDA defined types and primitives is the mostly likely way to maximize performance. This means option #3 in your question.

If you intend to write code that will run on CUDA and can also run on a stand-alone CPU when recompiled, I’d suggest coding for CUDA performance first and then back-porting for host CPU execution. CUDA is more picky about how things must be set up or structured than most host CPU architectures, and the performance benefits of doing things “right” for CUDA will far exceed the costs of doing things slightly suboptimal for the host CPU case.

I’d still use option #3 for the CUDA case and define a ulong2 structure for the host CPU case. Copying that structure around in the host CPU case will still require two (or four) memory moves behind the scenes, but it’s going to require that no matter what you do in source code. Use the simplest, easiest to read and understand source style and let the compiler take care of the heavy lifting.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I need to load 128 bit data per thread in CUDA C++. That in

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply