Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 8698371
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 13, 20262026-06-13T01:43:03+00:00 2026-06-13T01:43:03+00:00

I am learning to program in MPI and I came across this question. Lets

  • 0

I am learning to program in MPI and I came across this question. Lets say I have a .txt file with 100,000 rows/lines, how do I chunk them for processing by 4 processors? i.e. I want to let processor 0 take care of the processing for lines 0-25000, processor 1 to take care of 25001-50000 and so on. I did some searching and did came across MPI_File_seek but I am not sure can it work on .txt and supports fscanf afterwards.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-13T01:43:03+00:00Added an answer on June 13, 2026 at 1:43 am

    Text isn’t a great format for parallel processing exactly because you don’t know ahead of time where (say) line 25001 begins. So these sorts of problems are often dealt with ahead of time through some preprocessing step, either building an index or partitioning the file into the appropriate number of chunks for each process to read.

    If you really want to do it through MPI, I’d suggest using MPI-IO to read in overlapping chunks of the text file onto the various processors, where the overlap is much longer than you expect your longest line to be, and then have each processor agree on where to start; eg, you could say that the first (or last) new line in the overlap region shared by processes N and N+1 is where process N leaves off and N+1 starts.

    To follow this up with some code,

    #include <stdio.h>
    #include <mpi.h>
    #include <stdlib.h>
    #include <ctype.h>
    #include <string.h>
        
    void parprocess(MPI_File *in, MPI_File *out, const int rank, const int size, const int overlap) {
        MPI_Offset globalstart;
        int mysize;
        char *chunk;
        
        /* read in relevant chunk of file into "chunk",
         * which starts at location in the file globalstart
         * and has size mysize 
         */
        {
            MPI_Offset globalend;
            MPI_Offset filesize;
        
            /* figure out who reads what */
            MPI_File_get_size(*in, &filesize);
            filesize--;  /* get rid of text file eof */
            mysize = filesize/size;
            globalstart = rank * mysize;
            globalend   = globalstart + mysize - 1;
            if (rank == size-1) globalend = filesize-1;
        
            /* add overlap to the end of everyone's chunk except last proc... */
            if (rank != size-1)
                globalend += overlap;
        
            mysize =  globalend - globalstart + 1;
        
            /* allocate memory */
            chunk = malloc( (mysize + 1)*sizeof(char));
        
            /* everyone reads in their part */
            MPI_File_read_at_all(*in, globalstart, chunk, mysize, MPI_CHAR, MPI_STATUS_IGNORE);
            chunk[mysize] = '\0';
        }
        
        
        /*
         * everyone calculate what their start and end *really* are by going 
         * from the first newline after start to the first newline after the
         * overlap region starts (eg, after end - overlap + 1)
         */
        
        int locstart=0, locend=mysize-1;
        if (rank != 0) {
            while(chunk[locstart] != '\n') locstart++;
            locstart++;
        }
        if (rank != size-1) {
            locend-=overlap;
            while(chunk[locend] != '\n') locend++;
        }
        mysize = locend-locstart+1;
        
        /* "Process" our chunk by replacing non-space characters with '1' for
         * rank 1, '2' for rank 2, etc... 
         */
        
        for (int i=locstart; i<=locend; i++) {
            char c = chunk[i];
            chunk[i] = ( isspace(c) ? c : '1' + (char)rank );
        }
    
        
        /* output the processed file */
        
        MPI_File_write_at_all(*out, (MPI_Offset)(globalstart+(MPI_Offset)locstart), &(chunk[locstart]), mysize, MPI_CHAR, MPI_STATUS_IGNORE);
        
        return;
    }
        
    int main(int argc, char **argv) {
        
        MPI_File in, out;
        int rank, size;
        int ierr;
        const int overlap = 100;
        
        MPI_Init(&argc, &argv);
        MPI_Comm_rank(MPI_COMM_WORLD, &rank);
        MPI_Comm_size(MPI_COMM_WORLD, &size);
        
        if (argc != 3) {
            if (rank == 0) fprintf(stderr, "Usage: %s infilename outfilename\n", argv[0]);
            MPI_Finalize();
            exit(1);
        }
        
        ierr = MPI_File_open(MPI_COMM_WORLD, argv[1], MPI_MODE_RDONLY, MPI_INFO_NULL, &in);
        if (ierr) {
            if (rank == 0) fprintf(stderr, "%s: Couldn't open file %s\n", argv[0], argv[1]);
            MPI_Finalize();
            exit(2);
        }
        
        ierr = MPI_File_open(MPI_COMM_WORLD, argv[2], MPI_MODE_CREATE|MPI_MODE_WRONLY, MPI_INFO_NULL, &out);
        if (ierr) {
            if (rank == 0) fprintf(stderr, "%s: Couldn't open output file %s\n", argv[0], argv[2]);
            MPI_Finalize();
            exit(3);
        }
        
        parprocess(&in, &out, rank, size, overlap);
        
        MPI_File_close(&in);
        MPI_File_close(&out);
        
        MPI_Finalize();
        return 0;
    }
    

    Running this on a narrow version of the text of the question, we get

    $ mpirun -n 3 ./textio foo.in foo.out
    $ paste foo.in foo.out
    Hi guys I am learning to            11 1111 1 11 11111111 11
    program in MPI and I came           1111111 11 111 111 1 1111
    across this question. Lets          111111 1111 111111111 1111
    say I have a .txt file with         111 1 1111 1 1111 1111 1111
    100,000 rows/lines, how do          1111111 11111111111 111 11
    I chunk them for processing         1 11111 1111 111 1111111111
    by 4 processors? i.e. I want        22 2 22222222222 2222 2 2222
    to let processor 0 take care        22 222 222222222 2 2222 2222
    of the processing for lines         22 222 2222222222 222 22222
    0-25000, processor 1 to take        22222222 222222222 2 22 2222
    care of 25001-50000 and so          2222 22 22222222222 222 22
    on. I did some searching and        333 3 333 3333 333333333 333
    did came across MPI_File_seek       333 3333 333333 3333333333333
    but I am not sure can it work       333 3 33 333 3333 333 33 3333
    on .txt and supports fscanf         33 3333 333 33333333 333333
    afterwards.                         33333333333
    
    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I'm just learning to program...I have a dropdownlist like this. <div> <asp:DropDownList ID=DropDownList1 runat=server>
I'm just learning to program and have decided to try Ruby. I'm sure this
I am new to learning MPI and I coded up the following simple program
I am amateur programmer learning how to program. I have never had any computer
I'm learning to program in android and I've stuck with this ... Someone should
I am trying to program (while learning JQuery) the following: I have a form
I have just started learning how to program Windows GUI's using the low level
I just started learning C++ and I wrote this sample program from the text
I am learning Objective-C and have completed a simple program and got an unexpected
I have the following ruby code, from a learning to program book. I understand

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.