Why this kernel produces incoherent stores global void reverseArrayBlock(int d_out, int d_in) { int

Question

0

Asked: May 23, 20262026-05-23T00:23:30+00:00 2026-05-23T00:23:30+00:00

Why this kernel produces incoherent stores global void reverseArrayBlock(int d_out, int d_in) { int

0

Why this kernel produces incoherent stores

__global__ void reverseArrayBlock(int *d_out, int *d_in)
{
    int inOffset  = blockDim.x * blockIdx.x;
    int outOffset = blockDim.x * (gridDim.x - 1 - blockIdx.x);
    int in  = inOffset + threadIdx.x;
    int out = outOffset + (blockDim.x - 1 - threadIdx.x);
    d_out[out] = d_in[in];
}

and this one doesn’t

__global__ void reverseArrayBlock(int *d_out, int *d_in)
{
    extern __shared__ int s_data[];

    int inOffset  = blockDim.x * blockIdx.x;
    int in  = inOffset + threadIdx.x;

    // Load one element per thread from device memory and store it 
    // *in reversed order* into temporary shared memory
    s_data[blockDim.x - 1 - threadIdx.x] = d_in[in];

    // Block until all threads in the block have written their data to shared mem
    __syncthreads();

    // write the data from shared memory in forward order, 
    // but to the reversed block offset as before

    int outOffset = blockDim.x * (gridDim.x - 1 - blockIdx.x);

    int out = outOffset + threadIdx.x;
    d_out[out] = s_data[threadIdx.x];
}

I’m aware that second one is using shared memory. But when I look at indicies of d_out they seem to be the same in both kernel. Would you help me to understand this?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-23T00:23:31+00:00

Editorial Team

2026-05-23T00:23:31+00:00Added an answer on May 23, 2026 at 12:23 am

Coalescing requires that the addresses follow a “base + tid” pattern within a warp, where tid is short for the thread index. In other words, as tid increases, so does the address. Your comment calls this “forward order”. In the first kernel, addresses are generated such that as tid increases, the address decreases, i.e. the accesses are in “backward order”.

0

Reply
Share
Share

- Report

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

Why this kernel produces incoherent stores __global__ void reverseArrayBlock(int *d_out, int *d_in) { int

Leave an answerCancel reply

1 Answer

Why this kernel produces incoherent stores global void reverseArrayBlock(int d_out, int d_in) { int

Leave an answer
Cancel reply