Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • Home
  • SEARCH
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 3672726
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 19, 20262026-05-19T02:38:40+00:00 2026-05-19T02:38:40+00:00

I am curious to know if the allocating memory using a default new operator

  • 0

I am curious to know if the allocating memory using a default new operator is a non-blocking operation.

e.g.

struct Node {
    int a,b;
};

…

Node foo = new Node();

If multiple threads tried to create a new Node and if one of them was suspended by the OS in the middle of allocation, would it block other threads from making progress?

The reason why I ask is because I had a concurrent data structure that created new nodes. I then modified the algorithm to recycle the nodes. The throughput performance of the two algorithms was virtually identical on a 24 core machine. However, I then created an interference program that ran on all the system cores in order to create as much OS pre-emption as possible. The throughput performance of the algorithm that created new nodes decreased by a factor of 5 relative the algorithm that recycled nodes.

I’m curious to know why this would occur.

Thanks.

*Edit : pointing me to the code for the c++ memory allocator for linux would be helpful as well. I tried looking before posting this question, but had trouble finding it.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-19T02:38:41+00:00Added an answer on May 19, 2026 at 2:38 am

    seems to me if your interference app were using new/delete (malloc/free), then the interference app’s would interfere with the non recycle test more. But I don’t know how your interference test is implemented.

    Depending on how you recycle (ie if you use pthread mutexes god forbid) your recycle code could be slow (gcc atomic ops would be 40x faster at implementing recycle).

    Malloc, in some variation for a long time on at least some platforms, has been aware of threads. Use the compiler switches on gcc to be sure you get it. Newer algorithms maintain pools of small memory chunks for each thread, so there is no or little blocking if your thread has the small item available. I have over simplified this and it depends on what malloc your system is using. Plus, if you go and allocate millions of items to do a test….well then you wont see that effect, because the small item pools are limited in size. Or maybe you will. I don’t know. If you freed the item right after allocating, you would be more likely to see it. Freed small items go back into the small item lists rather than the shared heap. Although “what happens when thread B frees an item allocated by thread A” is a problem that may or may not be dealt with on your version of malloc and may not be dealt with in a non blocking manner. For sure, if you didn’t immediately free during a large test, then the the thread would have to refill its small item list many times. That can block if more than one thread tries. Finally, at some point your process’ heap will ask the system for heap memory, which can obviously block.

    So are you using small memory items? For your malloc I don’t know what small would be, but if you are < 1k that is for sure small. Are you allocating and freeing one after the other, or allocating thousands of nodes and then freeing thousands of nodes? Was your interference app allocating? All of these things will affect the results.

    How to recycle with atomic ops (CAS = compare and swap):

    First add a pNextFreeNode to your node object. I used void*, you can use your type. This code is for 32 bit pointers, but works for 64 bit as well. Then make a global recycle pile.

    void *_pRecycleHead; // global head of recycle list. 
    

    Add to recycle pile:

    void *Old;
    while (1) { // concurrency loop
      Old = _pRecycleHead;  // copy the state of the world. We operate on the copy
      pFreedNode->pNextFreeNode = Old; // chain the new node to the current head of recycled items
      if (CAS(&_pRecycleHead, Old, pFreedNode))  // switch head of recycled items to new node
        break; // success
    }
    

    remove from pile:

    void *Old;
    while (Old = _pRecycleHead) { // concurrency loop, only look for recycled items if the head aint null
      if (CAS(&_pRecycleHead, Old, Old->pNextFreeNode))  // switch head to head->next.
        break; // success
    }
    pNodeYoucanUseNow = Old;
    

    Using CAS means the operation will succeed only if the item you are changing is the Old value you pass in. If there is a race and another thread got there first, then the old value will be different. In real life this race happens very very rarely. CAS is only slighlty slower than actually setting a value so compared to mutexes….it rocks.

    The remove from pile, above, has a race condition if you add and remove the same item rapidly. We solve that by adding a version # to the CAS’able data. If you do the version # at the same time as the pointer to the head of the recycle pile you win. Use a union. Costs nothing extra to CAS 64 bits.

    union TRecycle {
      struct {
        int iVersion;
        void *pRecycleHead;
      } ;  // we can set these.  Note, i didn't name this struct.  You may have to if you want ANSI
      unsigned long long n64;  // we cas this
    }
    

    Note, You will have to go to 128 bit struct for 64 bit OS. so the global recycle pile looks like this now:

    TRecycle _RecycleHead;
    

    Add to recycle pile:

    while (1) { // concurrency loop
      TRecycle New,Old;
      Old.n64 = _RecycleHead.n64;  // copy state
      New.n64 = Old.n64;  // new state starts as a copy
      pFreedNode->pNextFreeNode = Old.pRecycleHead;  // link item to be recycled into recycle pile
      New.pRecycleHead = pFreedNode;  // make the new state
      New.iVersion++;  // adding item to list increments the version.
      if (CAS(&_RecycleHead.n64, Old.n64, New.n64))  // now if version changed...we fail
        break; // success
    }
    

    remove from pile:

    while (1) { // concurrency loop
      TRecycle New,Old;
      Old.n64 = _RecycleHead.n64;  // copy state
      New.n64 = Old.n64;  // new state starts as a copy
      New.pRecycleHead = New.pRecycledHead.pNextFreeNode;  // new will skip over first item in recycle list so we can have that item.
      New.iVersion++;  // taking an item off the list increments the version.
      if (CAS(&_RecycleHead.n64, Old.n64, New.n64))  // we fail if version is different.
        break; // success
    }
    pNodeYouCanUseNow = Old.pRecycledHead;
    

    I bet if you recycle this way you will see a perf increase.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I am curious as to what others are using in this situation. I know
I am just curious to know why this is allowed. int i=0;; I accidentaly
Curious to know how people set up their personal and/or work development environment, in
I'm curious to know how NULLs are stored into a database ? It surely
I'm curious to know whether the PDFKit framework is available for use within the
I am curious to know how the Loader maps DLL into Process Address Space.
I'm curious to know if there is an easy way to mock an IMAP
I am curious to know where the Don't Fragment [DF] Bit of the IP
I am a little curious to know about how OpenID authentication works. Is there
I came across this problem while preparing for an interview and curious to know

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.