I am confused about the following statements in the CUDA programming guide 4.0 section

Question

0

Asked: June 9, 20262026-06-09T14:41:30+00:00 2026-06-09T14:41:30+00:00

I am confused about the following statements in the CUDA programming guide 4.0 section

0

I am confused about the following statements in the CUDA programming guide 4.0 section 5.3.2.1
in the chapter of Performance Guidelines.

Global memory resides in device memory and device memory is accessed
via 32-, 64-, or 128-byte memory transactions. 

These memory transactions must be naturally aligned:Only the 32-, 64- , 
128- byte segments of device memory 
that are aligned to their size (i.e. whose first address is a 
multiple of their size) can be read or written by memory 
transactions.

1)
My understanding of device memory was that accesses to the device memory by threads is uncached: So if thread accesses memory location a[i] it will fetch only a[i] and none of the
values around a[i]. So the first statement seems to contradict this. Or perhaps I am misunderstanding the usage of the phrase “memory transaction” here?

2) The second sentence does not seem very clear. Can someone explain this?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-09T14:41:31+00:00

Memory transactions are performed per warp. So 32 byte transactions is a warp sized read of an 8 bit type, 64 byte transactions is a warp sized read of an 16 bit type, and 128 byte transactions is a warp sized read of an 32 bit type.
It just means that all reads have to be aligned to a natural word size boundary. It is not possible for a warp to read a 128 byte transaction with a one byte offset. See this answer for more details.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I am confused about the following statements in the CUDA programming guide 4.0 section

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply