Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 6583187
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 25, 20262026-05-25T16:22:34+00:00 2026-05-25T16:22:34+00:00

Please give me some explanation how a memory access works in the following kernel:

  • 0

Please give me some explanation how a memory access works in the following kernel:

__global__ void kernel(float4 *a)
{
     int tid = blockIdx.x * blockDim.x + threadIdx.x;

     float4 reg1, reg2;
     reg1 = a[tid]; //each thread reads a unique memory location

     for(int i = 0; i < totalThreadsNumber; i++)
     {  
          reg2 = a[i]; //all running threads start reading 
                       //the same global memory location
          //some computations
     }

     for(int i = 0; i < totalThreadsNumber; i++)
     {
          a[i] = reg1; // all running threads start writing 
                       //to the same global memory location
                       //race condition
     }
}

How does it work in the first loop ? Is there some serialization ? I assume that the second loop causes threads serialization (only within a warp ?) and the result is undefined.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-25T16:22:35+00:00Added an answer on May 25, 2026 at 4:22 pm

    Keeping my explanation to Fermi (sm_2x), on older hardware memory access are per half-warp instead.

    In the first loop (reading) the whole warp is reading from the same address into a local variable. This results in a “broadcast”. Since Fermi has a L1 cache either one cache line will be loaded or the data will be fetched directly from the cache (for subsequent iterations). In other words, there is no serialisation.

    In the second loop (writing) which thread wins is undefined – just like any multi-threaded programming model if multiple threads write to the same location the programmer is responsible for understanding the race conditions. You have no control over which warp in the block will execute last and also no control over which thread within the last warp will complete the write, so you can’t predict what the final value will be.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

Anybody please give some useful links on this topic.i need to create a content
Can you please give me some comparison between C compilers especially with respect to
What does $+ in a GNU makefile mean? Also, please give me some good
I am very confused between these two consistency models. Please give some timeline examples
please give some note about php functions using with smarty template
Hi all please give some basic about ActiveMQ with JMS for novice. And configuration
i dont know how to set status bar notifaction in android please give some
This is a interview-questions. Please give some hints: Use vector to implement a method,
please anyone give some code to use a png as drop down arrow, that's
How can I add optiongroup in asp:listitem. Please give some example code.

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.