Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 8155915
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 6, 20262026-06-06T16:48:11+00:00 2026-06-06T16:48:11+00:00

As a quick backdrop for my question, with x86, it is guaranteed that a

  • 0

As a quick backdrop for my question, with x86, it is guaranteed that a individual memory access that is 4-byte aligned for a 32-bit word, or 8-byte aligned for a 64-bit word will be atomic. Thus you can create “benign data-races”, where at least one thread writes to a memory address with another thread reading from the same address, and the reader will not see the results of an incomplete write. Either the reading thread will see the entire effect of the write or it won’t.

What are the requirements in the CUDA programming model to create these types of “benign” data-race conditions? For instance, if two separate threads write a 64-bit value to the same global memory address from two separate, but concurrently running blocks on two different SM’s, will each atomically write their entire 64-bit values, with a third observer only reading back a fully updated 64-bit memory block? Or would the writes take place with a smaller granularity, and thus a third observer would only see a partial write if it attempted to read back from the memory address after the two threads had simultaneously written to it?

I understand that race-conditions are normally something to avoid, but if the requirements for memory ordering are relaxed, then there is no need to explicitly use atomic read/write functions. That being said, this is predicated on what the atomicity of an individual read/write is (i.e., how many bits, and on what alignment). Does anyone know where I can find this information?

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-06T16:48:13+00:00Added an answer on June 6, 2026 at 4:48 pm

    Update: @Heatsink has kindly notified me that it is indeed possible to force some memory coherency by using the __threadfence() function.

    —

    Unless atomic functions are used, CUDA specifically does not guarantee any coherency when accessing global memory that has been updated by any thread scheduled in the same kernel call. It is only safe to read memory that was written by a previous kernel or memory copy.

    So, not only can you not assume anything about memory access patterns — you can’t even know when an update done to global memory by one thread may become visible to another thread, or indeed, if will become visible at all.

    Of course, given the way the hardware is implemented in a given architecture, you may be able to find a way to implement some type of non-blocking synchronization between threads. However, I sincerely doubt that it would be possible to do that safely between blocks. What the threads in one block see will depend on which SM the block runs, which blocks have run before, and where the updates done by those blocks currently are in the cache hierarchy.

    When considering threads within a block, the discussion is moot, as threads in a block can communicate with shared memory, the behavior of which is carefully specified by CUDA.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

Quick question about the TransactionScope object. Found this on the internet: When you access
Quick question that requires a long explanation.. Say I have two tables - one
Quick question: I'm using jquery ajax to call a page that returns some json
quick question : Is it possible to make a pointer that can reference a
Quick question that I can't seem to find an answer for. If I am
Quick question here, i've got a process running that grabs RSS feeds and adds
Quick question: If I have a very large function/sub in a class that is
Quick question; I replaced a .css file which I was referencing in html/php that
Quick question. What do you think, I have a few sites that use a
Quick question. There is a legacy website (that is not under my control and

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.