Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 7432953
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 29, 20262026-05-29T09:32:21+00:00 2026-05-29T09:32:21+00:00

Lets assume we have fixed amount of calculation work, without blocking, sleeping, i/o-waiting. The

  • 0

Lets assume we have fixed amount of calculation work, without blocking, sleeping, i/o-waiting. The work can be parallelized very well – it consists of 100M small and independent calculation tasks.

What is faster for 4-core CPU – to run 4 threads or… lets say 50? Why second variant should be slover and how much slover?

As i assume: when you run 4 heavy threads on 4-core CPU without another CPU-consuming processes/threads, scheduler is allowed to not move the threads between cores at all; it has no reason to do that in this situation. Core0 (main CPU) will be responsible for executing interruption handler for hardware timer 250 times per second (basic Linux configuration) and other hardware interruption handlers, but another cores may not feel any worries.

What is the cost of context switching? The time for store and restore CPU registers for different context? What about caches, pipelines and various code-prediction things inside CPU? Can we say that each time we switch context, we hurt caches, pipelines and some code-decoding facilities in CPU? So more threads executing on a single core, less work they can do together in comparison to their serial execution?

Question about caches and another hardware optimization in multithreading environment is the interesting question for me now.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-29T09:32:22+00:00Added an answer on May 29, 2026 at 9:32 am

    As @Baile mentions in the comments, this is highly application, system, environment-specific.

    And as such, I’m not going to take the hard-line approach of mentioning exactly 1 thread for each core. (or 2 threads/core in the case of Hyperthreading)

    As an experienced shared-memory programmer, I have seen from my experience that the optimal # of threads (for a 4 core machine) can range anywhere from 1 to 64+.

    Now I will enumerate the situations that can cause this range:

    Optimal Threads < # of Cores

    In certain tasks that are very fine-grained paralleled (such as small FFTs), the overhead of threading is the dominant performance factor. In some cases, it’s it not helpful to parallelize at all. In some cases, you get speedup with 2 threads, but backwards scaling at 4 threads.

    Another issue is resource contention. Even if you have a highly parallelizable task that can easily split across 4 cores/threads, you may be bottlenecked by memory bandwidth and cache effects. So often, you find that 2 threads will be just as fast as 4 threads. (as if often the case with very large FFTs)

    Optimal Threads = # of Cores

    This is the optimal case. No need to explain here – one thread per core. Most embarrassingly parallel applications that are not memory or I/O bound fit right here.

    Optimal Threads > # of Cores

    This is where it gets interesting… very interesting. Have you heard about load-imbalance? How about over-decomposition and work-stealing?

    Many parallelizable applications are irregular – meaning that the tasks do not split into sub-tasks of equal size. So if you may end up splitting a large task into 4 unequal sizes, assign them to 4 threads and run them on 4 cores… the result? Poor parallel performance because 1 thread happened to get 10x more work than the other threads.

    A common solution here is to over-decompose the task into many sub-tasks. You can either create threads for each one of them (so now you get threads >> cores). Or you can use some sort of task-scheduler with a fixed number of threads. Not all tasks are suited for both, so quite often, the approach of over-decomposing a task to 8 or 16 threads for a 4-core machine gives optimal results.


    Although spawning more threads can lead to better load-balance, the overhead builds up. So there’s typically an optimal point somewhere. I’ve seen as high as 64 threads on 4 cores. But as mentioned, it’s highly application specific. And you need to experiment.


    EDIT : Expanding answer to more directly answer the question…

    What is the cost of context switching? The time for store and restore
    CPU registers for different context?

    This is very dependent on the environment – and is somewhat difficult to measure directly.
    Short answer: Very Expensive This might be a good read.

    What about caches, pipelines and various code-prediction things inside
    CPU? Can we say that each time we switch context, we hurt caches,
    pipelines and some code-decoding facilities in CPU?

    Short answer: Yes When you context switch out, you likely flush your pipeline and mess up all the predictors. Same with caches. The new thread is likely to replace the cache with new data.

    There’s a catch though. In some applications where the threads share the same data, it’s possible that one thread could potentially “warm” the cache for another incoming thread or another thread on a different core sharing the same cache. (Although rare, I’ve seen this happen before on one of my NUMA machines – superlinear speedup: 17.6x across 16 cores!?!?!)

    So more threads executing on a single core, less work they can do
    together in comparison to their serial execution?

    Depends, depends… Hyperthreading aside, there will definitely be overhead. But I’ve read a paper where someone used a second thread to prefetch for the main thread… Yes it’s crazy…

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

well, i have network that each proxy (lets assume we have 200 proxies), send
Lets assume I have the following 3 entities: Customer,Order,Product which interact in the View
Lets assume we have a class car. How would You name parameters of function
Lets assume we have the following code: abstract class Base1 { protected int num;
Lets assume I have two hashes. One of them contains a set of data
Lets assume i have a bill number that has 12 numbers: 823 45678912 My
Lets assume i have a query like the following $query_search = SELECT * FROM
I have a city table that has two columns from_city to_city now lets assume
I have a series of points in a GraphicsPath; for our purpose lets assume
Year 2010 and we still have to write boilerplate codes. Lets assume we are

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.