Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • Home
  • SEARCH
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 8408745
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 9, 20262026-06-09T23:43:05+00:00 2026-06-09T23:43:05+00:00

I need to load 128 bit data per thread in CUDA C++. That in

  • 0

I need to load 128 bit data per thread in CUDA C++. That in this case it is better to use for maximum performance and compatibility with the code for the CPU?
Will the following examples to access the data the equal performance?

1: Use two:

unsigned __int64 src1 = arr[threadIdx.x/2];
unsigned __int64 src2 = arr[threadIdx.x/2 + 1];

2: Use:

struct T_src { unsigned __int64 src1, src2; };
T_src src = arr[threadIdx.x];

3: Use specific types of CUDA:

ulong2 src =  arr[threadIdx.x];
  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-09T23:43:07+00:00Added an answer on June 9, 2026 at 11:43 pm

    Accessing memory in the GPU’s “native” terms using CUDA defined types and primitives is the mostly likely way to maximize performance. This means option #3 in your question.

    If you intend to write code that will run on CUDA and can also run on a stand-alone CPU when recompiled, I’d suggest coding for CUDA performance first and then back-porting for host CPU execution. CUDA is more picky about how things must be set up or structured than most host CPU architectures, and the performance benefits of doing things “right” for CUDA will far exceed the costs of doing things slightly suboptimal for the host CPU case.

    I’d still use option #3 for the CUDA case and define a ulong2 structure for the host CPU case. Copying that structure around in the host CPU case will still require two (or four) memory moves behind the scenes, but it’s going to require that no matter what you do in source code. Use the simplest, easiest to read and understand source style and let the compiler take care of the heavy lifting.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I need to load a number of xhtml files that have this at the
I need to load a Model in a component to save the Data of
I need to load data into my treestore. My ajax request give me XML
I need to load a lot of data fetched from a mysql db in
I need load XNA.Texture2D to PictureBox. i've tried this: http://www.gamedev.net/community/forums/viewreply.asp?ID=3224621 but it doesn't work.
I need to load some properties into a Spring context from a location that
I need to load some text (shader code) from a file that is placed
Need to load data from a single file with a 100,000+ records into multiple
Need to load data from a single file with a 100,000+ records into multiple
I need load external resources from another server like css, template, data... but I

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.