Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 838549
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 15, 20262026-05-15T05:19:21+00:00 2026-05-15T05:19:21+00:00

The following program is essentially the same as the one described here . When

  • 0

The following program is essentially the same as the one described here. When I run and compile the program using two threads (NTHREADS == 2), I get the following run times:

real        0m14.120s
user        0m25.570s
sys         0m0.050s

When it is run with just one thread (NTHREADS == 1), I get run times significantly better even though it is only using one core.

real        0m4.705s
user        0m4.660s
sys         0m0.010s

My system is dual core, and I know random_r is thread safe and I am pretty sure it is non-blocking. When the same program is run without random_r and a calculation of cosines and sines is used as a replacement, the dual-threaded version runs in about 1/2 the time as expected.

#include <pthread.h>
#include <stdlib.h>
#include <stdio.h>

#define NTHREADS 2
#define PRNG_BUFSZ 8
#define ITERATIONS 1000000000

void* thread_run(void* arg) {
    int r1, i, totalIterations = ITERATIONS / NTHREADS;
    for (i = 0; i < totalIterations; i++){
        random_r((struct random_data*)arg, &r1);
    }
    printf("%i\n", r1);
}

int main(int argc, char** argv) {
    struct random_data* rand_states = (struct random_data*)calloc(NTHREADS, sizeof(struct random_data));
    char* rand_statebufs = (char*)calloc(NTHREADS, PRNG_BUFSZ);
    pthread_t* thread_ids;
    int t = 0;
    thread_ids = (pthread_t*)calloc(NTHREADS, sizeof(pthread_t));
    /* create threads */
    for (t = 0; t < NTHREADS; t++) {
        initstate_r(random(), &rand_statebufs[t], PRNG_BUFSZ, &rand_states[t]);
        pthread_create(&thread_ids[t], NULL, &thread_run, &rand_states[t]);
    }
    for (t = 0; t < NTHREADS; t++) {
        pthread_join(thread_ids[t], NULL);
    }
    free(thread_ids);
    free(rand_states);
    free(rand_statebufs);
}

I am confused why when generating random numbers the two threaded version performs much worse than the single threaded version, considering random_r is meant to be used in multi-threaded applications.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-15T05:19:22+00:00Added an answer on May 15, 2026 at 5:19 am

    A very simple change to space the data out in memory:

    struct random_data* rand_states = (struct random_data*)calloc(NTHREADS * 64, sizeof(struct random_data));
    char* rand_statebufs = (char*)calloc(NTHREADS*64, PRNG_BUFSZ);
    pthread_t* thread_ids;
    int t = 0;
    thread_ids = (pthread_t*)calloc(NTHREADS, sizeof(pthread_t));
    /* create threads */
    for (t = 0; t < NTHREADS; t++) {
        initstate_r(random(), &rand_statebufs[t*64], PRNG_BUFSZ, &rand_states[t*64]);
        pthread_create(&thread_ids[t], NULL, &thread_run, &rand_states[t*64]);
    }
    

    results in a much faster running time on my dual-core machine.

    This would confirm the suspicion it was meant to test – that you are mutating values on the same cache line in two separate threads, and so have cache contention. Herb Sutter’s ‘machine architecture – what your programming language never told you’ talk is worth watching if you’ve got the time if you don’t know about that yet, he demonstrates false sharing starting at around 1:20.

    Work out your cache line size, and create each thread’s data so it is aligned to it.

    It’s a bit cleaner to plonk all the thread’s data into a struct, and align that:

    #define CACHE_LINE_SIZE 64
    
    struct thread_data {
        struct random_data random_data;
        char statebuf[PRNG_BUFSZ];
        char padding[CACHE_LINE_SIZE - sizeof ( struct random_data )-PRNG_BUFSZ];
    };
    
    int main ( int argc, char** argv )
    {
        printf ( "%zd\n", sizeof ( struct thread_data ) );
    
        void* apointer;
    
        if ( posix_memalign ( &apointer, sizeof ( struct thread_data ), NTHREADS * sizeof ( struct thread_data ) ) )
            exit ( 1 );
    
        struct thread_data* thread_states = apointer;
    
        memset ( apointer, 0, NTHREADS * sizeof ( struct thread_data ) );
    
        pthread_t* thread_ids;
    
        int t = 0;
    
        thread_ids = ( pthread_t* ) calloc ( NTHREADS, sizeof ( pthread_t ) );
    
        /* create threads */
        for ( t = 0; t < NTHREADS; t++ ) {
            initstate_r ( random(), thread_states[t].statebuf, PRNG_BUFSZ, &thread_states[t].random_data );
            pthread_create ( &thread_ids[t], NULL, &thread_run, &thread_states[t].random_data );
        }
    
        for ( t = 0; t < NTHREADS; t++ ) {
            pthread_join ( thread_ids[t], NULL );
        }
    
        free ( thread_ids );
        free ( thread_states );
    }
    

    with CACHE_LINE_SIZE 64:

    refugio:$ gcc -O3 -o bin/nixuz_random_r src/nixuz_random_r.c -lpthread
    refugio:$ time bin/nixuz_random_r 
    64
    63499495
    944240966
    
    real    0m1.278s
    user    0m2.540s
    sys 0m0.000s
    

    Or you can use double the cache line size, and use malloc – the extra padding ensures the mutated memory is on separate lines, as malloc is 16 (IIRC) rather than 64 byte aligned.

    (I reduced ITERATIONS by a factor of ten rather than having a stupidly fast machine)

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Ask A Question

Stats

  • Questions 456k
  • Answers 456k
  • Best Answers 0
  • User 1
  • Popular
  • Answers
  • Editorial Team

    How to approach applying for a job at a company ...

    • 7 Answers
  • Editorial Team

    How to handle personal stress caused by utterly incompetent and ...

    • 5 Answers
  • Editorial Team

    What is a programmer’s life like?

    • 5 Answers
  • Editorial Team
    Editorial Team added an answer Inside an instance method, you can refer to the "current… May 15, 2026 at 10:22 pm
  • Editorial Team
    Editorial Team added an answer After 4 hours of banging my head against the wall,… May 15, 2026 at 10:22 pm
  • Editorial Team
    Editorial Team added an answer You're trying to avoid reading documentation, but I think the… May 15, 2026 at 10:22 pm

Trending Tags

analytics british company computer developers django employee employer english facebook french google interview javascript language life php programmer programs salary

Top Members

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.