Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • Home
  • SEARCH
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 3983574
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 20, 20262026-05-20T05:39:53+00:00 2026-05-20T05:39:53+00:00

What is coalesced in CUDA global memory transaction? I couldn’t understand even after going

  • 0

What is “coalesced” in CUDA global memory transaction? I couldn’t understand even after going through my CUDA guide. How to do it? In CUDA programming guide matrix example, accessing the matrix row by row is called “coalesced” or col.. by col.. is called coalesced?
Which is correct and why?

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-20T05:39:54+00:00Added an answer on May 20, 2026 at 5:39 am

    It’s likely that this information applies only to compute capabality 1.x, or cuda 2.0. More recent architectures and cuda 3.0 have more sophisticated global memory access and in fact “coalesced global loads” are not even profiled for these chips.

    Also, this logic can be applied to shared memory to avoid bank conflicts.


    A coalesced memory transaction is one in which all of the threads in a half-warp access global memory at the same time. This is oversimple, but the correct way to do it is just have consecutive threads access consecutive memory addresses.

    So, if threads 0, 1, 2, and 3 read global memory 0x0, 0x4, 0x8, and 0xc, it should be a coalesced read.

    In a matrix example, keep in mind that you want your matrix to reside linearly in memory. You can do this however you want, and your memory access should reflect how your matrix is laid out. So, the 3×4 matrix below

    0 1 2 3
    4 5 6 7
    8 9 a b
    

    could be done row after row, like this, so that (r,c) maps to memory (r*4 + c)

    0 1 2 3 4 5 6 7 8 9 a b
    

    Suppose you need to access element once, and say you have four threads. Which threads will be used for which element? Probably either

    thread 0:  0, 1, 2
    thread 1:  3, 4, 5
    thread 2:  6, 7, 8
    thread 3:  9, a, b
    

    or

    thread 0:  0, 4, 8
    thread 1:  1, 5, 9
    thread 2:  2, 6, a
    thread 3:  3, 7, b
    

    Which is better? Which will result in coalesced reads, and which will not?

    Either way, each thread makes three accesses. Let’s look at the first access and see if the threads access memory consecutively. In the first option, the first access is 0, 3, 6, 9. Not consecutive, not coalesced. The second option, it’s 0, 1, 2, 3. Consecutive! Coalesced! Yay!

    The best way is probably to write your kernel and then profile it to see if you have non-coalesced global loads and stores.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I'm going to parallelize on CUDA a local search algorithm for some optimization problem.
How can I determine if the following memory access is coalesced or not: //
Suppose, I declare a local variable in a CUDA kernel function for each thread:
I'm trying to understand what this SQL (from a MySQL installation) actually does: IF(coalesce(a.entity_id,
I'm looking for an Access 2007 equivalent to SQL Server's COALESCE function. In SQL
My coworker is new to C# and didn't know about the coalesce operator. So,
From this question , a neat answer about using COALESCE to simplify complex logic
In T-SQL, you can do this: SELECT ProductId, COALESCE(Price, 0) FROM Products How do
Suppose I have the following two strings containing regular expressions. How do I coalesce
A java app runs with the following flag: -XX:+PrintSafepointStatistics, and then produces the following

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.