Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • Home
  • SEARCH
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 6766051
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 26, 20262026-05-26T14:47:02+00:00 2026-05-26T14:47:02+00:00

Can someone please help me with a very simple example on how to use

  • 0

Can someone please help me with a very simple example on how to use shared memory? The example included in the Cuda C programming guide seems cluttered by irrelevant details.

For example, if I copy a large array to the device global memory and want to square each element, how can shared memory be used to speed this up? Or is it not useful in this case?

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-26T14:47:03+00:00Added an answer on May 26, 2026 at 2:47 pm

    In the specific case you mention, shared memory is not useful, for the following reason: each data element is used only once. For shared memory to be useful, you must use data transferred to shared memory several times, using good access patterns, to have it help. The reason for this is simple: just reading from global memory requires 1 global memory read and zero shared memory reads; reading it into shared memory first would require 1 global memory read and 1 shared memory read, which takes longer.

    Here’s a simple example, where each thread in the block computes the corresponding value, squared, plus the average of both its left and right neighbors, squared:

      __global__ void compute_it(float *data)
      {
         int tid = threadIdx.x;
         __shared__ float myblock[1024];
         float tmp;
    
         // load the thread's data element into shared memory
         myblock[tid] = data[tid];
    
         // ensure that all threads have loaded their values into
         // shared memory; otherwise, one thread might be computing
         // on unitialized data.
         __syncthreads();
    
         // compute the average of this thread's left and right neighbors
         tmp = (myblock[tid > 0 ? tid - 1 : 1023] + myblock[tid < 1023 ? tid + 1 : 0]) * 0.5f;
         // square the previousr result and add my value, squared
         tmp = tmp*tmp + myblock[tid] * myblock[tid];
    
         // write the result back to global memory
         data[tid] = tmp;
      }
    

    Note that this is envisioned to work using only one block. The extension to more blocks should be straightforward. Assumes block dimension (1024, 1, 1) and grid dimension (1, 1, 1).

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I am very new to RegEx -- so can someone please help me figure
EDIT 1: Can someone please help me create a very very simplistic sample page
Can someone please help me out with printing the contents of an IFrame via
Can someone please help me? In Perl, what is the difference between: exec command;
Can someone please help quickly mute or unmute the stage volume in Flash CS3
can someone please help me. why does this return an error: Dim stuff As
Can someone please help me with using Regex with NSPredicate? NSString *regex = @(?:[A-Za-z0-9]);
Can someone please help me out with a JavaScript/jQuery solution for this arithmetic problem:
Can someone please help me figure out a way to achieve the following (see
Could someone please help explain why I can't get this to work? I properly

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.