Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 6993073
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 27, 20262026-05-27T19:43:03+00:00 2026-05-27T19:43:03+00:00

My application scenario is like this: I want to evaluate the performance gain one

  • 0

My application scenario is like this: I want to evaluate the performance gain one can achieve on a quad-core machine for processing the same amount of data. I have following two configurations:

i) 1-Process: A program without any threading and processes data from 1M .. 1G, while system was assumed to run only single core of its 4-cores.

ii) 4-threads-Process: A program with 4-threads (all threads performing same operation) but processing 25% of the input data.

In my program for creating 4-threads, I used pthread’s default options (i.e., without any specific pthread_attr_t). I believe the performance gain of 4-thread configuration comparing to 1-Process configuration should be closer to 400% (or somewhere between 350% and 400%).

I profiled the time spent in creation of threads just like this below:

timer_start(&threadCreationTimer); 
pthread_create( &thread0, NULL, fun0, NULL );
pthread_create( &thread1, NULL, fun1, NULL );
pthread_create( &thread2, NULL, fun2, NULL );
pthread_create( &thread3, NULL, fun3, NULL );
threadCreationTime = timer_stop(&threadCreationTimer);

pthread_join(&thread0, NULL);
pthread_join(&thread1, NULL);
pthread_join(&thread2, NULL);
pthread_join(&thread3, NULL);    

Since increase in the size of the input data may also increase in the memory requirement of each thread, then so loading all data in advance is definitely not a workable option. Therefore, in order to ensure not to increase the memory requirement of each thread, each thread reads data in small chunks, process it and reads next chunk process it and so on. Hence, structure of the code of my functions run by threads is like this:

timer_start(&threadTimer[i]);
while(!dataFinished[i])
{
    threadTime[i] += timer_stop(&threadTimer[i]);
    data_source();
    timer_start(&threadTimer[i]);
    process();
}
threadTime[i] += timer_stop(&threadTimer[i]);

Variable dataFinished[i] is marked true by process when the it received and process all needed data. Process() knows when to do that 🙂

In the main function, I am calculating the time taken by 4-threaded configuration as below:

execTime4Thread = max(threadTime[0], threadTime[1], threadTime[2], threadTime[3]) + threadCreationTime.

And performance gain is calculated by simply

gain = execTime1process / execTime4Thread * 100

Issue:
On small data size around 1M to 4M, the performance gain is generally good (between 350% to 400%). However, the trend of performance gain is exponentially decreasing with increase in the input size. It keeps decreasing until some data size of upto 50M or so, and then become stable around 200%. Once it reached that point, it remains almost stable for even 1GB of data.

My question is can anybody suggest the main reasoning of this behaviour (i.e., performance drop at the start and but remaining stable later)?

And suggestions how to fix that?

For your information, I also investigated the behaviour of threadCreationTime and threadTime for each thread to see what’s happening. For 1M of data the values of these variables are small and but with increase in the data size both these two variables increase exponentially (but threadCreationTime should remain almost same regardless of data size and threadTime should increase at a rate corresponding to data being processing). After keep on increasing until 50M or so threadCreationTime becomes stable and threadTime (just like performance drop becomes stable) and threadCreationTime keep increasing at a constant rate corresponding to increase in data to be processed (which is considered understandable).

Do you think increasing the stack size of each thread, process priority stuff or custom values of other parameters type of scheduler (using pthread_attr_init) can help?

PS: The results are obtained while running the programs under Linux’s fail safe mode with root (i.e., minimal OS is running without GUI and networking stuff).

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-27T19:43:03+00:00Added an answer on May 27, 2026 at 7:43 pm

    Since increase in the size of the input data may also increase in the
    memory requirement of each thread, then so loading all data in advance
    is definitely not a workable option. Therefore, in order to ensure not
    to increase the memory requirement of each thread, each thread reads
    data in small chunks, process it and reads next chunk process it and
    so on.

    Just this, alone, can cause a drastic speed decrease.

    If there is sufficient memory, reading one large chunk of input data will always be faster than reading data in small chunks, especially from each thread. Any I/O benefits from chunking (caching effects) disappears when you break it down into pieces. Even allocating one big chunk of memory is much cheaper than allocating small chunks many, many times.

    As a sanity check, you can run htop to ensure that at least all your cores are being topped out during the run. If not, your bottleneck could be outside of your multi-threading code.

    Within the threading,

    • threading context switches due to many threads can cause sub-optimal speedup
    • as mentioned by others, a cold cache due to not reading memory contiguously can cause slowdowns

    But re-reading your OP, I suspect the slowdown has something to do with your data input/memory allocation. Where exactly are you reading your data from? Some kind of socket? Are you sure you need to allocate memory more than once in your thread?

    Some algorithm in your worker threads is likely to be suboptimal/expensive.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

Is there some open source file upload (application) solution for a scenario like this?
My application is looking like the below snap in some scenario. There is one
Scenario I have a windows forms application. I want to use two different WCF
User scenario is like follows. My application has list of companies. Each company has
I want to implement single sign on in my asp.net web application. Scenario is
I want to build a web application (SaaS) that can work both in Online
Suppose this simple scenario: My client has an already working .net application and he/she
The scenario is like this. I have a rich:tabPanel with about 5 tabs. On
Apparently, I can't find any help on this. I have a scenario where I
My scenario is like this: <ScrollView android:layout_width=fill_parent android:layout_height=fill_parent android:scrollbars=vertical android:background=#ffffeb> <LinearLayout android:orientation=vertical android:layout_width=fill_parent android:layout_height=fill_parent

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.