Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 6958283
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 27, 20262026-05-27T15:07:11+00:00 2026-05-27T15:07:11+00:00

I have a program with the general structure shown below. Basically, I have a

  • 0

I have a program with the general structure shown below. Basically, I have a vector of objects. Each object has member vectors, and one of those is a vector of structs that contain more vectors. By multithreading, the objects are operated on in parallel, doing computation that involves much accessing and modifying of member vector elements. One object is acessed by only one thread at a time, and is copied to that thread’s stack for processing.

The problem is that the program fails to scale up to 16 cores. I suspect and am advised that the issue may be false sharing and/or cache invalidation. If this is true, it seems that the cause must be vectors allocating memory too close to each other, as it is my understanding that both problems are (in simple terms) caused by proximal memory addresses being accessed simultaneously by different processors. Does this reasoning make sense, is it likely that this could happen? If so, it seems that I can solve this problem by padding the member vectors using .reserve() to add extra capacity, leaving large spaces of empty memory between vector arrays. So, does all this make any sense? Am I totally out to lunch here?

struct str{
    vector <float> a;   vector <int> b;      vector <bool> c;  };

class objects{
    vector <str> a;     vector <int> b;      vector <float> c;  
    //more vectors, etc ...
    void DoWork();            //heavy use of vectors
};    

main(){
    vector <object> objs;
    vector <object> p_objs = &objs;

    //...make `thread_list` and `attr`
    for(int q=0; q<NUM_THREADS; q++)
        pthread_create(&thread_list[q], &attr, Consumer, p_objs );
    //...
}

void* Consumer(void* argument){
     vector <object>* p_objs = (vector <object>*) argument ;
     while(1){
         index = queued++;  //imagine queued is thread-safe global
         object obj = (*p_objs)[index]        
         obj.DoWork();
         (*p_objs)[index] = obj;
}
  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-27T15:07:11+00:00Added an answer on May 27, 2026 at 3:07 pm

    Well, the last vector copied in thread 0 is objs[0].c. The first vector copied in thread 1 is objs[1].a[0].a. So if their two blocks of allocated data happen to both occupy the same cache line (64 bytes, or whatever it actually is for that CPU), you’d have false sharing.

    And of course the same is true of any two vectors involved, but just for the sake of a concrete example I have pretended that thread 0 runs first and does its allocation before thread 1 starts allocating, and that the allocator tends to make consecutive allocations adjacent.

    reserve() might prevent the parts of that block that you’re actually acting on, from occupying the same cache line. Another option would be per-thread memory allocation — if those vectors’ blocks are allocated from different pools then they can’t possibly occupy the same line unless the pools do.

    If you don’t have per-thread allocators, the problem could be contention on the memory allocator, if DoWork reallocates the vectors a lot. Or it could be contention on any other shared resource used by DoWork. Basically, imagine that each thread spends 1/K of its time doing something that requires global exclusive access. Then it might appear to parallelize reasonably well up to a certain number J <= K, at which point acquiring the exclusive access significantly eats into the speed-up because cores are spending a significant proportion of time idle. Beyond K cores there’s approximately no improvement at all with extra cores, because the shared resource cannot work any faster.

    At the absurd end of this, imagine some work that spends 1/K of its time holding a global lock, and (K-1)/K of its time waiting on I/O. Then the problem appears to be embarrassingly parallel almost up to K threads (irrespective of the number of cores), at which point it stops dead.

    So, don’t focus on false sharing until you’ve ruled out true sharing 😉

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I have program that has a variable that should never change. However, somehow, it
Does a free general purpose ASN.1 Decode/Dump/Inspect program exist? I have a suspect ASN.1
I have a program that (amongst other things) has a command line interface that
In my program I have a set of view-model objects to represent items in
I have a program below that doesn't seem to be doing what I want
I have program, that must interact with a console program before my program can
I have program that runs fast enough. I want to see the number of
I have a program that spits out both standard error and standard out, and
I have a program that creates a Windows user account using the NetUserAdd() API
I have a program that uses the mt19937 random number generator from boost::random. I

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.