Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 9267827
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 18, 20262026-06-18T14:38:57+00:00 2026-06-18T14:38:57+00:00

I have writen a program that can be summarized as follows: def loadHugeData(): #load

  • 0

I have writen a program that can be summarized as follows:

def loadHugeData():
    #load it
    return data

def processHugeData(data, res_queue):
    for item in data:
        #process it
        res_queue.put(result)
    res_queue.put("END")

def writeOutput(outFile, res_queue):
    with open(outFile, 'w') as f
        res=res_queue.get()
        while res!='END':
            f.write(res)
            res=res_queue.get()

res_queue = multiprocessing.Queue()

if __name__ == '__main__':
    data=loadHugeData()
    p = multiprocessing.Process(target=writeOutput, args=(outFile, res_queue))
    p.start()
    processHugeData(data, res_queue)
    p.join()

The real code (especially writeOutput()) is a lot more complicated. writeOutput() only uses these values that it takes as its arguments (meaning it does not reference data)

Basically it loads a huge dataset into memory and processes it. Writing of the output is delegated to a sub-process (it writes into multiple files actually and this takes a lot of time).
So each time one data item gets processed it is sent to the sub-process trough res_queue which in turn writes the result into files as needed.

The sub-process does not need to access, read or modify the data loaded by loadHugeData() in any way. The sub-process only needs to use what the main process sends it trough res_queue. And this leads me to my problem and question.

It seems to me that the sub-process gets its own copy of the huge dataset (when checking memory usage with top). Is this true? And if so then how can i avoid id (using double memory essentially)?

I am using Python 2.6 and program is running on linux.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-18T14:38:58+00:00Added an answer on June 18, 2026 at 2:38 pm

    The multiprocessing module is effectively based on the fork system call which creates a copy of the current process. Since you are loading the huge data before you fork (or create the multiprocessing.Process), the child process inherits a copy of the data.

    However, if the operating system you are running on implements COW (copy-on-write), there will only actually be one copy of the data in physical memory unless you modify the data in either the parent or child process (both parent and child will share the same physical memory pages, albeit in different virtual address spaces); and even then, additional memory will only be allocated for the changes (in pagesize increments).

    You can avoid this situation by calling multiprocessing.Process before you load your huge data. Then the additional memory allocations will not be reflected in the child process when you load the data in the parent.

    Edit: reflecting @Janne Karila’s comment in the answer, as it is so relevant: "Note also that every Python object contains a reference count that is modified whenever the object is accessed. So, just reading a data structure can cause COW to copy."

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I have written a short (450 lines) program that calculates some situations that can
I'm trying to write a program that outputs data that can be served over
I have a little bash program that calls a webservice that returns JSON data.
I tried to write a program that can have mutiple-threaded reads and writes. It
I have been able to write a program that can read any text files...
I have written a program that uses qhttp to get a webpage. This works
I have written a java program that is actually works as a gui to
I have written a c# program that calls a c++ dll that echoes the
I have to write a program that sniffs network packets (part1-the simple part). And
Below I have written a sample program that I have written to learn about

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.