Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 511825
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 13, 20262026-05-13T07:15:25+00:00 2026-05-13T07:15:25+00:00

I got some huge files I need to parse, and people have been recommending

  • 0

I got some huge files I need to parse, and people have been recommending mmap because this should avoid having to allocate the entire file in-memory.

But looking at ‘top’ it does look like I’m opening the entire file into the memory, so I think I must be doing something wrong. ‘top shows >2.1 gig’

This is a code snippet that shows what I’m doing.

Thanks

#include <stdio.h>
#include <stdlib.h>
#include <err.h>
#include <fcntl.h>
#include <sysexits.h>
#include <unistd.h>
#include <sys/stat.h>
#include <sys/types.h>
#include <sys/mman.h>
#include <cstring>
int main (int argc, char *argv[] ) {
  struct stat sb;
  char *p,*q;
  //open filedescriptor
  int fd = open (argv[1], O_RDONLY);
  //initialize a stat for getting the filesize
  if (fstat (fd, &sb) == -1) {
    perror ("fstat");
    return 1;
  }
  //do the actual mmap, and keep pointer to the first element
  p =(char *) mmap (0, sb.st_size, PROT_READ, MAP_SHARED, fd, 0);
  q=p;
  //something went wrong
  if (p == MAP_FAILED) {
    perror ("mmap");
    return 1;
  }
  //lets just count the number of lines
  size_t numlines=0;
  while(*p++!='\0')
    if(*p=='\n')
      numlines++;
  fprintf(stderr,"numlines:%lu\n",numlines);
  //unmap it
  if (munmap (q, sb.st_size) == -1) {
    perror ("munmap");
    return 1;
  }
  if (close (fd) == -1) {
    perror ("close");
    return 1;
  }
  return 0;
}
  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-13T07:15:25+00:00Added an answer on May 13, 2026 at 7:15 am

    No, what you’re doing is mapping the file into memory. This is different to actually reading the file into memory.

    Were you to read it in, you would have to transfer the entire contents into memory. By mapping it, you let the operating system handle it. If you attempt to read or write to a location in that memory area, the OS will load the relevant section for you first. It will not load the entire file unless the entire file is needed.

    That is where you get your performance gain. If you map the entire file but only change one byte then unmap it, you’ll find that there’s not much disk I/O at all.

    Of course, if you touch every byte in the file, then yes, it will all be loaded at some point but not necessarily in physical RAM all at once. But that’s the case even if you load the entire file up front. The OS will swap out parts of your data if there’s not enough physical memory to contain it all, along with that of the other processes in the system.

    The main advantages of memory mapping are:

    • you defer reading the file sections until they’re needed (and, if they’re never needed, they don’t get loaded). So there’s no big upfront cost as you load the entire file. It amortises the cost of loading.
    • The writes are automated, you don’t have to write out every byte. Just close it and the OS will write out the changed sections. I think this also happens when the memory is swapped out as well (in low physical memory situations), since your buffer is simply a window onto the file.

    Keep in mind that there is most likely a disconnect between your address space usage and your physical memory usage. You can allocate an address space of 4G (ideally, though there may be OS, BIOS or hardware limitations) in a 32-bit machine with only 1G of RAM. The OS handles the paging to and from disk.

    And to answer your further request for clarification:

    Just to clarify. So If I need the entire file, mmap will actually load the entire file?

    Yes, but it may not be in physical memory all at once. The OS will swap out bits back to the filesystem in order to bring in new bits.

    But it will also do that if you’ve read the entire file in manually. The difference between those two situations is as follows.

    With the file read into memory manually, the OS will swap parts of your address space (may include the data or may not) out to the swap file. And you will need to manually rewrite the file when your finished with it.

    With memory mapping, you have effectively told it to use the original file as an extra swap area for that file/memory only. And, when data is written to that swap area, it affects the actual file immediately. So no having to manually rewrite anything when you’re done and no affecting the normal swap (usually).

    It really is just a window to the file:

         
         
         
         
    memory mapped file image

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I have this huge VB project which i just got from some one. i
I've got some huge log files that I need to view. I don't want
I got some huge gz files. And I need to extract the different columns
I have got some markup that looks like this: <div class=button> <span class=image></span> <input
Got some code that is not mine and its producing this warning atm: iehtmlwin.cpp(264)
I got some problems in Datagrid WPF I have a datagrid, and I want
First of all, I got this huge xml file that represents data collected by
I have some .gz compressed files which is around 5-7gig uncompressed. These are flatfiles.
I've got a comma delimited string of id's coming in and I need some
I've got a huge struts.xml file and I want to add some logic in

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.