Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 6203235
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 24, 20262026-05-24T04:52:42+00:00 2026-05-24T04:52:42+00:00

Compiler: clang++ x86-64 on linux. It has been a while since I have written

  • 0

Compiler: clang++ x86-64 on linux.

It has been a while since I have written any intricate low level system code, and I ussualy program against the system primitives (windows and pthreads/posix). So, the in#s and out’s have slipped from my memory. I am working with boost::asio and boost::thread at the moment.

In order to emulate synchronous RPC against an asynchronous function executor (boost::io_service with multiple threads io::service::run‘ing where requests are io_serviced::post‘ed), I am using boost synchronization primitives. For curiosities sake I decided to sizeof the primitives. This is what I get to see.

struct notification_object
{
  bool ready;
  boost::mutex m;
  boost::condition_variable v;
};
...
std::cout << sizeof(bool) << std::endl;
std::cout << sizeof(boost::mutex) << std::endl;
std::cout << sizeof(boost::condition_variable) << std::endl;
std::cout << sizeof(notification_object) << std::endl;
...

Output:

1
40
88
136

Forty bytes for a mutex ?? ?? ? WTF ! 88 for a condition_variable !!! Please keep in mind that I’m repulsed by this bloated size because I am thinking of an application that could create hundreds of notification_object‘s

This level of overhead for portability seems ridiculous, can someone justify this? As far as I can remember these primitives should be 4 or 8 bytes wide depending on the memory model of the CPU.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-24T04:52:43+00:00Added an answer on May 24, 2026 at 4:52 am

    When you look at “size overhead” for any type of synchronization primitive, keep in mind that these cannot be packed too closely. That is so because e.g. two mutexes sharing a cacheline would end up in cache trashing (false sharing) if they’re in-use concurrently, even if the users acquiring these locks never “conflict”. I.e. imagine two threads running two loops:

    for (;;) {
        lock(lockA);
        unlock(lockA);
    }
    

    and

    for (;;) {
        lock(lockB);
        unlock(lockB);
    }
    

    You will see twice the number of iterations when run on two different threads compared to one thread running one loop if and only if the two locks are not within the same cacheline. If lockA and lockB are in the same cacheline, the number of iterations per thread will half – because the cacheline with those two locks in will permanently bounce between the cpu cores executing these two threads.

    Hence even though the actual data size of the primitive data type underlying a spinlock or mutex might only be a byte or a 32bit word, the effective data size of such an object is often larger.

    Keep that in mind before asserting “my mutexes are too large”. In fact, on x86/x64, 40 Bytes is too small to prevent false sharing, as cachelines there are currently at least 64 Bytes.

    Beyond that, if you’re highly concerned about memory usage, consider that notification objects need not be unique – condition variables can serve to trigger for different events (via the predicate that boost::condition_variable knows about). It’d therefore be possible to use a single mutex/CV pair for a whole state machine instead of one such pair per state. Same goes for e.g. thread pool synchronization – having more locks than threads is not necessarily beneficial.

    Edit: For a few more references on “false sharing” (and the negative performance impact caused by hosting multiple atomically-updated variables within the same cacheline), see (amongst others) the following SO postings:

    • false sharing in boost::detail::spinlock_pool?
    • False sharing and pthreads
    • False Sharing and Atomic Variables

    As said, when using multiple “synchronization objects” (whether that’d be atomically-updated variables, locks, semaphores, …) in a multi-core, cache-per-core config, allow each of them a separate cacheline of space. You’re trading memory usage for scalability here, but really, if you get into the region where your software needs several millions of locks (making that GBs of mem), you either have the funding for a few hundred GB of memory (and a hundred CPU cores), or you’re doing something wrong in your software design.

    In most cases (a lock / an atomic for a specific instance of a class / struct), you get the “padding” for free as long as the object instance that contains the atomic variable is large enough.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

Has anybody run Clang as a C compiler on ppc architecture? I am trying
This has been troubling me for a while. It goes to the heart of
Is there a way in gcc or clang (or any other compiler) to spit
Getting this error: 2009-09-03 12:44:02.307 xcodebuild[307:10b] warning: compiler 'com.apple.compilers.llvm.clang.1_0.analyzer' is based on missing compiler
I have installed clang and llvm from source, and am trying to compile some
I want to build LLVM clang compiler, but CMake ends up with the following
I'm trying to compile a little project that includes windows.h using the clang compiler.
I'm a newbie in clang . I have read a paper about source to
When I switch the Compiler Version to Clang llvm 1.0 in existing projects I
I am new to llvm/clang and have successfully built clang with MinGW. I have

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.