Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 7738451
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 1, 20262026-06-01T08:16:06+00:00 2026-06-01T08:16:06+00:00

I need to read a massive amount of data into a buffer (about 20gig).

  • 0

I need to read a massive amount of data into a buffer (about 20gig). I have 192gb of very fast DDram available, so no issue with memory size. However, I am finding that the following code runs slower and slower the further it gets into the buffer. The Visual C profiler tells me that 68% of the 12 minute execution time is in the 2 statements inside the loop in myFunc(). I am running win7, 64bit on a very fast dell with 2 cpu’s, 6 physical cores each (24 logical cores), and all 24 cores are completely maxed out while running this.

#define TREAM_COUNT 9000
#define ARRAY_SIZE ONE_BILLION

#define offSet(a,b,c,d) ( ((size_t)  ARRAY_SIZE * (a)) + ((size_t) TREAM_COUNT * 800 * (b)) + ((size_t) 800 * (c)) + (d) )

void myFunc(int dogex, int ptxIndex, int xtreamIndex, int carIndex)
{
     short *ptx  =  (short *) calloc(ARRAY_SIZE * 20, sizeof(short));

    #pragma omp parallel for
    for (int bIndex = 0; bIndex < 800; ++bIndex)
          doWork(dogex, ptxIndex, carIndex);
}

 void doWork(int dogex, int ptxIndex, int carIndex)
{

    for (int treamIndex = 0; treamIndex < ONE_BILLION; ++treamIndex)
    {
         short ptxValue     =  ptx[ offSet(dogex, ptxIndex,   treamIndex, carIndex) ];
         short lastPtxValue =  ptx[ offSet(dogex, ptxIndex-1, treamIndex, carIndex) ];

         // ....
    }

}
  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-01T08:16:09+00:00Added an answer on June 1, 2026 at 8:16 am

    The code allocated 20 blocks of one billion short ints. On a 64-bit Windows box, a short int is 2 bytes. So the allocation is ~40 gigabytes.

    You say there are 24 cores and they’re all maxed out. The code as it is doesn’t appear to show any parallelism. The way in which the code is parallelised could have a profound effect upon performance. You may need to provide more information.

    —

    Your basic problem, I suspect, revolves around cache behaviour and memory access limits.

    First, with two physical CPUs of six cores each, you will utterly saturate your memory bus. Probably you have a NUMA architecture anyway, but there’s no control in the code about where your calloc() allocates (e.g. you could have a lot of code stored in memory which requires multiple hops to reach).

    Hyperthreading is turned on. This effectively halves cache sizes. Given the code is memory bus bound, rather than compute bound, hyperthreading is harmful. (Having said that, if computation is constantly outside of cache bounds anyway, this won’t change much).

    It’s not clear (since some/much?) code is removed, how the array is being accessed and the access pattern and optimimzation of that pattern to honour cache optimization is the key to performance.

    What I see in how offset() is caculated is that the code is constantly requiring the generation of new virtual to physical address lookups – each of which requires something like four or five memory accesses. This is kiling performance, by itself.

    My basic advice would be break the array up into level 2 cache-sized blocks, give one block to each CPU and let it process that block. You can do that in parallel. Actually, you might be able to use hyperthreading to pre-load the cache, but that’s a more advanced technique.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I need to read data added to the end of an executable from within
I need read in and parse data from a third party website which sends
My site is going to have many products available, but they'll be categorised into
I need to read in data files which look like this: * SZA: 10.00
I have a flat CSV file which I need read from to end up
I need read about 100 samples per second off the accelerometer on a Android,
This has been killing me - I have a massive file that I need
Here's the situation: I have a massive object that needs to be loaded into
I have a large amount of data that I am pulling from an xml
Can I read a XML file from remote server such as http://dealer.somedomain.com/data/inventory.xml I need

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.