Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 9130339
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 17, 20262026-06-17T07:56:49+00:00 2026-06-17T07:56:49+00:00

I have a python script which read a file line by line and look

  • 0

I have a python script which read a file line by line and look if each line matches a regular expression.

I would like to improve the performance of that script by using memory map the file before I search. I have looked into mmap example: http://docs.python.org/2/library/mmap.html

My question is how can I mmap a file when it is too big (15GB) for the memory of my machine (4GB)

I read the file like this:

fi = open(log_file, 'r', buffering=10*1024*1024)

for line in fi: 
    //do somemthong

fi.close()

Since I set the buffer to 10MB, in terms of performance, is it the same as I mmap 10MB of file?

Thank you.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-17T07:56:50+00:00Added an answer on June 17, 2026 at 7:56 am

    First, the memory of your machine is irrelevant. It’s the size of your process’s address space that’s relevant. With a 32-bit Python, this will be somewhere under 4GB. With a 64-bit Python, it will be more than enough.

    The reason for this is that mmap isn’t about mapping a file into physical memory, but into virtual memory. An mmapped file becomes just like a special swap file for your program. Thinking about this can get a bit complicated, but the Wikipedia links above should help.

    So, the first answer is “use a 64-bit Python”. But obviously that may not be applicable in your case.

    The obvious alternative is to map in the first 1GB, search that, unmap it, map in the next 1GB, etc. The way you do this is by specifying the length and offset parameters to the mmap method. For example:

    m = mmap.mmap(f.fileno(), length=1024*1024*1024, offset=1536*1024*1024)
    

    However, the regex you’re searching for could be found half-way in the first 1GB, and half in the second. So, you need to use windowing—map in the first 1GB, search, unmap, then map in a partially-overlapping 1GB, etc.

    The question is, how much overlap do you need? If you know the maximum possible size of a match, you don’t need anything more than that. And if you don’t know… well, then there is no way to actually solve the problem without breaking up your regex—if that isn’t obvious, imagine how you could possibly find a 2GB match in a single 1GB window.

    Answering your followup question:

    Since I set the buffer to 10MB, in terms of performance, is it the same as I mmap 10MB of file?

    As with any performance question, if it really matters, you need to test it, and if it doesn’t, don’t worry about it.

    If you want me to guess: I think mmap may be faster here, but only because (as J.F. Sebastian implied) looping and calling re.match 128K times as often may cause your code to be CPU-bound instead of IO-bound. But you could optimize that away without mmap, just by using read. So, would mmap be faster than read? Given the sizes involved, I’d expect the performance of mmap to be much faster on old Unix platforms, about the same on modern Unix platforms, and a bit slower on Windows. (You can still get large performance benefits out of mmap over read or read+lseek if you’re using madvise, but that’s not relevant here.) But really, that’s just a guess.

    The most compelling reason to use mmap is usually that it’s simpler than read-based code, not that it’s faster. When you have to use windowing even with mmap, and when you don’t need to do any seeking with read, this is less compelling, but still, if you try writing the code both ways, I’d expect your mmap code would end up a bit more readable. (Especially if you tried to optimize out the buffer copies from the obvious read solution.)

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I have written a python script which read from a txt file and perform
I have a python script which extracts unique IP addresses from a log file
i have some text file which contain proxy ip . which look like following
I have python script which downloads N number of images from website. I run
I have a python script which outputs lots of data, sample is as below.
i have a python script which keeps crashing on: subprocess.call([pdftotext, pdf_filename]) the error being:
I have a small python script which draws some turtle graphics. When my script
I have the following python script which takes some inputs and puts them in
I have written a python script which watches a directory for new subdirectories, and
In my Python script which uses Curses, I have a subwin to which some

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.