Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 9233825
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 18, 20262026-06-18T06:38:38+00:00 2026-06-18T06:38:38+00:00

I am studying theoretical stuff on GPUs used for scientific applications and I found

  • 0

I am studying theoretical stuff on GPUs used for scientific applications and I found this sentence:

High arithmetic intensity and many data elements mean that memory access latency can be hidden with calculations instead of big data caches.

What does this exactly mean? Can be interpreted as a suggestion to avoid storing some precomputed results when programming for a GPU, but to compute them every time we run a function on the device?

E.g., suppose we have a code which performs a recursive loop in order to compute a long array, with tons of calculations in it. Besides, suppose we could precompute some partial arrays which would help inside the loop to skip some computations, even some which are not very expensive. According to the quote, should we avoid this but compute these arrays every cycle?

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-18T06:38:39+00:00Added an answer on June 18, 2026 at 6:38 am

    GPUs have access to different types of memory. The memory type you use to offer the GPU data to work with, and data to retrieve from the GPU when it’s done computing is the global memory (for example, a standard GTX480 has 1.5GB of memory).

    This memory has high bandwidth, but also high latency (around 400-800 cycles on a GTX480). So instead of precomputing things, storing it in global memory, and then retrieving it (causing high latency), you are better off if you compute it on the GPU. This way, you do not have to wait on memory to retrieve precomputed data.

    If all the threads that are active at a given time (= warp), then this causes a high latency since these threads cannot advance because the data has not arrived. GPUs can calculate quite a lot in 400-800 cycles, so it’s better to exchange memory fetches for computation.

    That being said, you can use other types of memory that is available to you. For example, in CUDA, you have access to on-chip memory (shared memory), which is very fast and has very low latency. You can have one thread in a warp calculate something, store it in shared memory, and have the other threads use that value. So you move the precalculation to the GPU, and use on-chip memory to retrieve the precomputed values.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

Im studying some ruby code and I see this varx variable being used with
I am studying 'Web Service' this week and found good tutorial and example code.
I'm studying ASM 8086 theoretically on highschool (MASM, x86). .data var dd 421,422, 443,
I'm studying Oleg's and Asai's delimited continuations for dummies paper(http://pllab.is.ocha.ac.jp/~asai/cw2011tutorial/main-e.pdf) but this paper uses
Studying OAuth2.0 I finally found these 2 refs: RFC6749 section 2.3 , RFC6749 section
While studying for the SCJP 6 exam, I ran into this question in a
While studying the book Introduction to Algorithms by Cormen, I found a strange thing.
Currently studying bitwise arithmetic. It's really easy, because I have some CS background. But
When studying the project 'gproc' source code files, I have found several patch files
I'm studying Java basics from the beginning. But there is still some theoretical basis.

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.