Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 8466641
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 10, 20262026-06-10T15:24:04+00:00 2026-06-10T15:24:04+00:00

I am thinking about implementing a wrapper for MPI that imitates OpenMP’s way of

  • 0

I am thinking about implementing a wrapper for MPI that imitates OpenMP’s way
of parallelizing for loops.

  begin_parallel_region( chunk_size=100 , num_proc=10 );

  for( int i=0 ; i<1000 ; i++ )
  {
       //some computation 
  }

  end_parallel_region();

The code above distributes computation inside the for loop to 10 slave MPI processors.
Upon entering the parallel region, the chunk size and number of slave processors are provided.
Upon leaving the parallel region, the MPI processors are synched and are put idle.

EDITED in response to High Performance Mark.

I have no intention to simulate the OpenMP’s shared memory model.
I propose this because I need it.
I am developing a library that is required to build graphs from mathetical functions.
In these mathetical functions, there often exist for loops like the one below.

 for( int i=0 ; i<n ; i++ )
 {
          s = s + sin(x[i]);
 }

So I want to first be able to distribute sin(x[i]) to slave processors and at the end reduce to the single varible just like in OpenMP.

I was wondering if there is such a wrapper out there so that I don’t have to reinvent the wheel.

Thanks.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-10T15:24:06+00:00Added an answer on June 10, 2026 at 3:24 pm

    Yes, you could do this, for specific tasks. But you shouldn’t.

    Consider how you might implement this; the begin part would distribute the data, and the end part would bring the answer back:

    #include <stdio.h>
    #include <stdlib.h>
    #include <math.h>
    #include <mpi.h>
    
    typedef struct state_t {
        int globaln;
        int localn;
        int *locals;
        int *offsets;
        double *localin;
        double *localout;
        double (*map)(double);
    } state;
    
    state *begin_parallel_mapandsum(double *in, int n, double (*map)(double)) {
        state *s = malloc(sizeof(state));
        s->globaln = n;
        s->map = map;
    
        /* figure out decomposition */
    
        int size, rank;
        MPI_Comm_size(MPI_COMM_WORLD, &size);
        MPI_Comm_rank(MPI_COMM_WORLD, &rank);
    
        s->locals  = malloc(size * sizeof(int));
        s->offsets = malloc(size * sizeof(int));
    
        s->offsets[0] = 0;
    
        for (int i=0; i<size; i++) {
            s->locals[i] = (n+i)/size;
            if (i < size-1) s->offsets[i+1] = s->offsets[i] + s->locals[i];
        }
    
        /* allocate local arrays */
        s->localn   = s->locals[rank];
        s->localin  = malloc(s->localn*sizeof(double));
        s->localout = malloc(s->localn*sizeof(double));
    
    
        /* distribute */
        MPI_Scatterv( in, s->locals, s->offsets, MPI_DOUBLE,
                      s->localin, s->locals[rank], MPI_DOUBLE,
                      0, MPI_COMM_WORLD);
    
        return s;
    }
    
    double  end_parallel_mapandsum(state **s) {
        double localanswer=0., answer;
    
        /* sum up local answers */
        for (int i=0; i<((*s)->localn); i++) {
            localanswer += ((*s)->localout)[i];
        }
    
        /* and get global result.  Everyone gets answer */
        MPI_Allreduce(&localanswer, &answer, 1, MPI_DOUBLE, MPI_SUM, MPI_COMM_WORLD);
    
        free( (*s)->localin );
        free( (*s)->localout );
        free( (*s)->locals );
        free( (*s)->offsets );
        free( (*s) );
    
        return answer;
    }
    
    
    int main(int argc, char **argv) {
        int rank;
        double *inputs;
        double result;
        int n=100;
        const double pi=4.*atan(1.);
    
        MPI_Init(&argc, &argv);
        MPI_Comm_rank(MPI_COMM_WORLD, &rank);
    
        if (rank == 0) {
            inputs = malloc(n * sizeof(double));
            for (int i=0; i<n; i++) {
                inputs[i] = 2.*pi/n*i;
            }
        }
    
        state *s=begin_parallel_mapandsum(inputs, n, sin);
    
        for (int i=0; i<s->localn; i++) {
            s->localout[i] = (s->map)(s->localin[i]);
        }
    
        result = end_parallel_mapandsum(&s);
    
        if (rank == 0) {
            printf("Calculated result: %lf\n", result);
            double trueresult = 0.;
            for (int i=0; i<n; i++) trueresult += sin(inputs[i]);
            printf("True  result: %lf\n", trueresult);
        }
    
        MPI_Finalize();
    
    }
    

    That constant distribute/gather is a terrible communications burden to sum up a few numbers, and is antithetical to the entire distributed-memory computing model.

    To a first approximation, shared memory approaches – OpenMP, pthreads, IPP, what have you – are about scaling computations faster; about throwing more processors at the same chunk of memory. On the other hand, distributed-memory computing is about scaling a computation bigger; about using more resourses, particularly memory, than can be found on a single computer. The big win of using MPI is when you’re dealing with problem sets which can’t fit on any one node’s memory, ever. So when doing distributed-memory computing, you avoid having all the data in any one place.

    It’s important to keep that basic approach in mind even when you are just using MPI on-node to use all the processors. The above scatter/gather approach will just kill performance. The more idiomatic distributed-memory computing approach is for the logic of the program to already have distributed the data – that is, your begin_parallel_region and end_parallel_region above would have already been built into the code above your loop at the very beginning. Then, every loop is just

     for( int i=0 ; i<localn ; i++ )
        {
              s = s + sin(x[i]);
        }
    

    and when you need to exchange data between tasks (or reduce a result, or what have you) then you call the MPI functions to do those specific tasks.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I'm thinking about a good solution for implementing a sign up/login system that works
Just thinking about the best way to build an Order form that would (from
I'm thinking about implementing Intense Debate in my Wordpress site. I know that ID
I was thinking about implementing an application in Java (with a GWT GUI) that
I think I know C++ reasonably well and I am thinking about implementing something
Thinking about avoiding code replication, I got a question that catches me every time
I am thinking about implementing a user interface according to the MVP pattern using
I am thinking about implementing a program with finite state automaton in an OOP
I was thinking about implementing a logic similar to observer pattern on my website,
I've been implementing the push service to my application, and I've been thinking about

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.