Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • Home
  • SEARCH
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 594883
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 13, 20262026-05-13T16:00:06+00:00 2026-05-13T16:00:06+00:00

I am attempting to write a simple particle system that leverages CUDA to do

  • 0

I am attempting to write a simple particle system that leverages CUDA to do the updating of the particle positions. Right now I am defining a particle has an object with a position defined with three float values, and a velocity also defined with three float values. When updating the particles, I am adding a constant value to the Y component of the velocity to simulate gravity, then adding the velocity to the current position to come up with the new position. In terms of memory management is it better to maintain two separate arrays of floats to store the data or to structure in a object oriented way. Something like this:

struct Vector
{
    float x, y, z;
};

struct Particle
{
    Vector position;
    Vector velocity;
};

It seems like the size of the data is the same with either method (4 bytes per float, 3 floats per Vector, 2 Vectors per Particle totaling 24 bytes total) It seems like the OO approach would allow more effiecient data transfer between the CPU and GPU because I could use a single Memory copy statement instead of 2 (and in the long run more, as there are a few other bits of information about particles that will become relevant, like Age, Lifetime, Weight/Mass, Temperature, etc) And then theres also just the simple readability of the code and ease of dealing with it that also makes me inclined toward the OO approach. But the examples I have seen don’t utilize structured data, so it makes me wonder if theres a reason.

So the question is which is better: individual arrays of data or structured objects?

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-13T16:00:06+00:00Added an answer on May 13, 2026 at 4:00 pm

    It’s common in data parallel programming to talk about “Struct of Arrays” (SOA) versus “Array of Structs” (AOS), where the first of your two examples is AOS and the second is SOA. Many parallel programming paradigms, in particular SIMD-style paradigms, will prefer SOA.

    In GPU programming, the reason that SOA is typically preferred is to optimise the accesses to the global memory. You can view the recorded presentation on Advanced CUDA C from GTC last year for a detailed description of how the GPU accesses memory.

    The main point is that memory transactions have a minimum size of 32 bytes and you want to maximise the efficiency of each transaction.

    With AOS:

    position[base + tid].x = position[base + tid].x + velocity[base + tid].x * dt;
    //  ^ write to every third address                    ^ read from every third address
    //                           ^ read from every third address
    

    With SOA:

    position.x[base + tid] = position.x[base + tid] + velocity.x[base + tid] * dt;
    //  ^ write to consecutive addresses                  ^ read from consecutive addresses
    //                           ^ read from consecutive addresses
    

    In the second case, reading from consecutive addresses means that you have 100% efficiency versus 33% in the first case. Note that on older GPUs (compute capability 1.0 and 1.1) the situation is much worse (13% efficiency).

    There is one other possibility – if you had two or four floats in the struct then you could read the AOS with 100% efficiency:

    float4 lpos;
    float4 lvel;
    lpos = position[base + tid];
    lvel = velocity[base + tid];
    lpos.x += lvel.x * dt;
    //...
    position[base + tid] = lpos;
    

    Again, check out the Advanced CUDA C presentation for the details.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I've been attempting to write a Lisp macro that would perfom the equivalent of
I am attempting to write an application that uses libCurl to post soap requests
I'm attempting to write a Python C extension that reads packed binary data (it
I am attempting to write a one-line Perl script that will toggle a line
I'm attempting to write a simple B+tree implementation (very early stages). I've got a
I am attempting to write a simple WPF learning project which creates a set
I am attempting to write some simple code in F#, and i get this
I'm attempting to write a simple query where I declare some variables and then
I am attempting to write a very simple rake task (and merge it into
So I'm attempting to write a Django reusable app that provides a method for

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.