Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 7049849
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 28, 20262026-05-28T03:04:17+00:00 2026-05-28T03:04:17+00:00

I’m using IPython.parallel to process a large amount of data on a cluster. The

  • 0

I’m using IPython.parallel to process a large amount of data on a cluster. The remote function I run looks like:

def evalPoint(point, theta):
    # do some complex calculation
    return (cost, grad)

which is invoked by this function:

def eval(theta, client, lview, data):
    async_results = []
    for point in data:
        # evaluate current data point
        ar = lview.apply_async(evalPoint, point, theta)
        async_results.append(ar)

    # wait for all results to come back
    client.wait(async_results)

    # and retrieve their values
    values = [ar.get() for ar in async_results]

    # unzip data from original tuple
    totalCost, totalGrad = zip(*values)

    avgGrad =  np.mean(totalGrad, axis=0)
    avgCost = np.mean(totalCost, axis=0)

    return (avgCost, avgGrad)

If I run the code:

client = Client(profile="ssh")
client[:].execute("import numpy as np")        

lview = client.load_balanced_view()

for i in xrange(100):
    eval(theta, client, lview, data)

the memory usage keeps growing until I eventually run out (76GB of memory). I’ve simplified evalPoint to do nothing in order to make sure it wasn’t the culprit.

The first part of eval was copied from IPython’s documentation on how to use the load balancer. The second part (unzipping and averaging) is fairly straight-forward, so I don’t think that’s responsible for the memory leak. Additionally, I’ve tried manually deleting objects in eval and calling gc.collect() with no luck.

I was hoping someone with IPython.parallel experience could point out something obvious I’m doing wrong, or would be able to confirm this in fact a memory leak.

Some additional facts:

  • I’m using Python 2.7.2 on Ubuntu 11.10
  • I’m using IPython version 0.12
  • I have engines running on servers 1-3, and the client and hub running on server 1. I get similar results if I keep everything on just server 1.
  • The only thing I’ve found similar to a memory leak for IPython had to do with %run, which I believe was fixed in this version of IPython (also, I am not using %run)

update

Also, I tried switching logging from memory to SQLiteDB, in case that was the problem, but still have the same problem.

response(1)

The memory consumption is definitely in the controller (I could verify this by: (a) running the client on another machine, and (b) watching top). I hadn’t realized that non SQLiteDB would still consume memory, so I hadn’t bothered purging.

If I use DictDB and purge, I still see the memory consumption go up, but at a much slower rate. It was hovering around 2GB for 20 invocations of eval().

If I use MongoDB and purge, it looks like mongod is taking around 4.5GB of memory and ipcluster about 2.5GB.

If I use SQLite and try to purge, I get the following error:

File "/usr/local/lib/python2.7/dist-packages/IPython/parallel/controller/hub.py", line 1076, in purge_results
  self.db.drop_matching_records(dict(completed={'$ne':None}))
File "/usr/local/lib/python2.7/dist-packages/IPython/parallel/controller/sqlitedb.py", line 359, in drop_matching_records
  expr,args = self._render_expression(check)
File "/usr/local/lib/python2.7/dist-packages/IPython/parallel/controller/sqlitedb.py", line 296, in _render_expression
  expr = "%s %s"%null_operators[op]
TypeError: not enough arguments for format string

So, I think if I use DictDB, I might be okay (I’m going to try a run tonight). I’m not sure if some memory consumption is still expected or not (I also purge in the client like you suggested).

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-28T03:04:17+00:00Added an answer on May 28, 2026 at 3:04 am

    Is it the controller process that is growing, or the client, or both?

    The controller remembers all requests and all results, so the default behavior of storing this information in a simple dict will result in constant growth. Using a db backend (sqlite or preferably mongodb if available) should address this, or the client.purge_results() method can be used to instruct the controller to discard any/all of the result history (this will delete them from the db if you are using one).

    The client itself caches all of its own results in its results dict, so this, too, will result in growth over time. Unfortunately, this one is a bit harder to get a handle on, because references can propagate in all sorts of directions, and is not affected by the controller’s db backend.

    This is a known issue in IPython, but for now, you should be able to clear the references manually by deleting the entries in the client’s results/metadata dicts and if your view is sticking around, it has its own results dict:

    # ...
    # and retrieve their values
    values = [ar.get() for ar in async_results]
    
    # clear references to the local cache of results:
    for ar in async_results:
        for msg_id in ar.msg_ids:
            del lview.results[msg_id]
            del client.results[msg_id]
            del client.metadata[msg_id]
    

    Or, you can purge the entire client-side cache with simple dict.clear():

    view.results.clear()
    client.results.clear()
    client.metadata.clear()
    

    Side note:

    Views have their own wait() method, so you shouldn’t need to pass the Client to your function at all. Everything should be accessible via the View, and if you really need the client (e.g. for purging the cache), you can get it as view.client.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I have a string like this: La Torre Eiffel paragonata all’Everest What PHP function
I want to construct a data frame in an Rcpp function, but when I
I have thousands of HTML files to process using Groovy/Java and I need to
I have some data like this: 1 2 3 4 5 9 2 6
I'm new to using the Perl treebuilder module for HTML parsing and can't figure
That's pretty much it. I'm using Nokogiri to scrape a web page what has
link Im having trouble converting the html entites into html characters, (&# 8217;) i
I would like to count the length of a string with PHP. The string
For some reason, after submitting a string like this Jack’s Spindle from a text
I've got a string that has curly quotes in it. I'd like to replace

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.