Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 7971937
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 4, 20262026-06-04T07:44:27+00:00 2026-06-04T07:44:27+00:00

I am brand new to CUDA programming. I have a CUDA subroutine that, hopefully,

  • 0

I am brand new to CUDA programming. I have a CUDA subroutine that, hopefully, models the diffusion equation with a simple source:

attributes(global) subroutine diff_time_stepper(v,diffconst)


real*8 :: v(:,:)
real*8 :: diffconst

real*8 :: vintermed
integer :: i,j,m
integer :: nx, ny

nx=256
ny=256

i=(blockIdx%x-1)*blockDim%x+threadIdx%x
j=(blockIdx%y-1)*blockDim%y+threadIdx%y

if (i<nx .and. j<ny .and. i>1 .and. j>1) then
  vintermed=v(i,j)+diffconst*(v(i-1,j)-2.*v(i,j)+v(i+1,j)+v(i,j-1)-2.*v(i,j)+v(i,j+1))
  v(i,j)=vintermed
! add a source for the heck of it
  if (i==64 .and. j==64) v(i,j)=v(i,j)+1
endif


end subroutine

My question: This routine seems to work, giving reasonable results (even though not running as fast as I had hoped). But do I have “backward dependencies” here? In particular, vinterm gets set by a function involving several v’s, and then v gets set equal to vtinerm. And then v(64,64) gets set after this calculation. Are these potential problems? More generally, even though I have found alot of discussion about how to program for CUDA, I have found very little discussion on this issue of backward dependence, which seems to me of paramount importance. Can anyone point me to a good discussion of this? Thanks.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-04T07:44:29+00:00Added an answer on June 4, 2026 at 7:44 am

    Yes, you do. The various threads are reading in and updating v in, in general, an unpredictable order (although there are things you can say about the behaviour of threads within a warp on the same block, etc.) This would be true (with different caveats) with say OpenMP or whatever your favourite CPU-based threading framework was.

    The diffusion problem is more robust to this than most – eg, Gauss Sidel or Jacobi iteration explicitly make use of updating values before they’re read – but you’ll still get get problems with this as different threads have values from different timesteps; in particular threads reading in or not reading in v(64,64) before it is updated will in general lead to inconsistent shapes of the peak around the source.

    You ran make sure this doesn’t happen within a block by synchronizing threads after reading:

    ...
    real*8 :: left, right, up, down, centre
    
    nx=256
    ny=256
    
    i=(blockIdx%x-1)*blockDim%x+threadIdx%x
    j=(blockIdx%y-1)*blockDim%y+threadIdx%y
    
    if (i<nx .and. j<ny .and. i>1 .and. j>1) then
      left  = v(i-1,j)
      right = v(i+1,j)
      up    = v(i,  j+1)
      down  = v(i,  j-1)
      centre= v(i,j)
    endif
    call syncthreads()
    
    if (i<nx .and. j<ny .and. i>1 .and. j>1) then
      v(i,j) = centre + diffconst*(left+right+up+down-4.*centre)
      if (i==64 .and. j==64) v(i,j)=v(i,j)+1
    endif
    

    But this just pushes the problems to the block boundaries rather than the thread boundaries; you still don’t know the order in which the blocks are reading/updating v. But the only way to synchronize blocks is with the end of the kernel.

    There’s a few ways around this, all involving using more memory or less parallelism. One way to do it would be to have everyone read from vold, say, and update vnew; then everyone is just reading from the old array and updating the new, and then there’s no issue of synchronization. Then you’d just switch the meanings of old and new each timestep, so every odd timestep reads in vold and outputs vnew, and every even timestep goes the other way.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

First I'm brand new to JS but have an idea that object classes are
I'm brand new to Ruby/Rails/RSpec, and hopefully this is a dumb/simple question. I'm currently
Brand new to django. We have a legacy django project using django 0.96x that
I'm brand new to OOP and I have a question that's gotta be pretty
I'm brand new to SQL Server 2008, and have some newbie questions about the
new to java and brand new to the site. I have a JLabel added
I am brand new to joomla, and am trying to have a few sections,
I am a brand new Java developer (I have been working in asp.net) and
I am brand new to assembly programming and I was wondering why the address
I'm brand new to Java, programming, and StackOverflow. I need to use a list

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.