Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 8170723
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 6, 20262026-06-06T21:13:16+00:00 2026-06-06T21:13:16+00:00

I am trying to handel the data generated by the following piece of code:

  • 0

I am trying to handel the data generated by the following piece of code:

for Gnodes in G.nodes()       # Gnodes iterates over 10000 values 
    Gvalue = someoperation(Gnodes)
    for Hnodes in H.nodes()   # Hnodes iterates over 10000 values 
        Hvalue =someoperation(Hnodes)
        score = SomeOperation on (Gvalue,Hvalue)
        dic_score.setdefault(Gnodes,[]).append([Hnodes, score, -1 ])

Since the dictionary is large (10000 keys X 10000 list with 3 elements each), it is difficult to keep it in memory. I was looking for a solution which stores the key:value (in the form of list) pair as soon as they are generated. It was advised here, Writing and reading a dictionary in specific format (Python), to use ZODB in combination with Btree.

Bear with me if this is too naive, my question is, when should one call transaction.commit() to commit the data ? If I call it at the end of the inner loop, the resulting file is extremely large ( not sure why). Here is a snippet:

storage = FileStorage('Data.fs')
db = DB(store)
connection = db.open()
root = connection.root()
btree_container = IOBTree
root[0] = btree_container 
for nodes in G.nodes()
    btree_container[nodes] = PersistentList () ## I was loosing data prior to doing this 

for Gnodes in G.nodes()       # Gnodes iterates over 10000 values 
    Gvalue = someoperation(Gnodes)
    for Hnodes in H.nodes()   # Hnodes iterates over 10000 values 
        Hvalue =someoperation(Hnodes)
        score = SomeOperation on (Gvalue,Hvalue)
        btree_container.setdefault(Gnodes,[]).append([Hnodes, score, -1 ])
        transaction.commit()

What if I call it outside both the loops? Something like:

    ......
       ......
          score = SomeOperation on (Gvalue,Hvalue)
          btree_container.setdefault(Gnodes,[]).append([Hnodes, score, -1 ])
    transaction.commit()

Will all the data be held in the memory till I call transaction.commit()? Again, I am not sure why but this results in a smaller file size on disk.

I want to minimize the data being held in the memory. Any guidance would be appreciated !

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-06T21:13:17+00:00Added an answer on June 6, 2026 at 9:13 pm

    Your goal is to make your process manageable within memory constraints. To be able to do this with the ZODB as a tool you need to understand how ZODB transactions work, and how to use them.

    Why your ZODB grows so large

    First of all you need to understand what a transaction commit does here, which also explains why your Data.fs is getting so large.

    The ZODB writes data out per transaction, where any persistent object that has changed gets written to disk. The important detail here is persistent object that has changed; the ZODB works in units of persistent objects.

    Not every python value is a persistent object. If I define a straight-up python class, it will not be persistent, nor are any of the built-in python types such as int or list. On the other hand, any class you define that inherits from persistence.Persistent is a persistent object. The BTrees set of classes, as well as the PeristentList class you use in your code do inherit from Persistent.

    Now, on a transaction commit, any persistent object that has changed is written to disk as part of that transaction. So any PersistentList object that has been append to will be written in it’s entirety to disk. BTrees handle this a little more efficient; they store Buckets, themselves persistent, which in turn hold the actually stored objects. So for every few new nodes you create, a Bucket is written to the transaction, not the whole BTree structure. Note that because the items held in the tree are themselves persistent objects only references to them are stored in the Bucket records.

    Now, the ZODB writes transaction data by appending it to the Data.fs file, and it does not remove old data automatically. It can construct the current state of the database by finding the most recent version of a given object from the store. This is why your Data.fs is growing so much, you are writing out new versions of larger and larger PersistentList instances as transactions are committed.

    Removing the old data is called packing, which is similar to the VACUUM command in PostgreSQL and other relational databases. Simply call the .pack() method on the db variable to remove all old revisions, or use the t and days parameters of that method to set limits on how much history to retain, the first is a time.time() timestamp (seconds since the epoch) before which you can pack, and days is the number of days in the past to retain from current time or t if specified. Packing should reduce your data file considerably as the partial lists in older transactions are removed. Do note that packing is an expensive operation and thus can take a while, depending on the size of your dataset.

    Using transaction to manage memory

    You are trying to build a very large dataset, by using persistence to work around constraints with memory, and are using transactions to try and flush things to disk. Normally, however, using a transaction commit signals you have completed constructing your dataset, something you can use as one atomic whole.

    What you need to use here is a savepoint. Savepoints are essentially subtransactions, a point during the whole transaction where you can ask for data to be temporarily stored on disk. They’ll be made permanent when you commit the transaction. To create a savepoint, call the .savepoint method on the transaction:

    for Gnodes in G.nodes():      # Gnodes iterates over 10000 values 
        Gvalue = someoperation(Gnodes)
        for Hnodes in H.nodes():  # Hnodes iterates over 10000 values 
            Hvalue =someoperation(Hnodes)
            score = SomeOperation on (Gvalue,Hvalue)
            btree_container.setdefault(Gnodes, PersistentList()).append(
                [Hnodes, score, -1 ])
        transaction.savepoint(True)
    transaction.commit()
    

    In the above example I set the optimistic flag to True, meaning: I do not intent to roll back to this savepoint; some storages do not support rolling back, and signalling you do not need this makes your code work in such situations.

    Also note that the transaction.commit() happens when the whole data set has been processed, which is what a commit is supposed to achieve.

    One thing a savepoint does, is call for a garbage collection of the ZODB caches, which means that any data not currently in use is removed from memory.

    Note the ‘not currently in use’ part there; if any of your code holds on to large values in a variable the data cannot be cleared from memory. As far as I can determine from the code you’ve shown us, this looks fine. But I do not know how your operations work or how you generate the nodes; be careful to avoid building complete lists in memory there when an iterator will do, or build large dictionaries where all your lists of lists are referenced, for example.

    You can experiment a little as to where you create your savepoints; you could create one every time you’ve processed one HNodes, or only when done with a GNodes loop like I’ve done above. You are constructing a list per GNodes, so it would be kept in memory while looping over all the H.nodes() anyway, and flushing to disk would probably only make sense once you’ve completed constructing it in full.

    If, however, you find that you need to clear memory more often, you should consider using either a BTrees.OOBTree.TreeSet class or a BTrees.IOBTree.BTree class instead of a PersistentList to break up your data into more persistent objects. A TreeSet is ordered but not (easily) indexable, while a BTree could be used as a list by using simple incrementing index keys:

    for i, Hnodes in enumerate(H.nodes()):
        ...
        btree_container.setdefault(Gnodes, IOBTree())[i] = [Hnodes, score, -1]
        if i % 100 == 0:
            transaction.savepoint(True)
    

    The above code uses a BTree instead of a PersistentList and creates a savepoint every 100 HNodes processed. Because the BTree uses buckets, which are persistent objects in themselves, the whole structure can be flushed to a savepoint more easily without having to stay in memory for all H.nodes() to be processed.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I'm trying to pull data from a QR code via zxing using the following
I'm trying handle bad json data when parsed through json_decode(). I'm using the following
I'm trying to handle DBNull exception while reading data from database. It's my code:
I am trying to use the Bing api in python with the following code:
I'm trying to figure out how my new app should handle data. In previous
I'm trying to verify I understand Core Data relationships and/or possibly how to handle
I'm trying to handle a JSONP Callback, I have the following JavaScript var URL
I'm trying to get a System.Drawing.Image (generated in a .NET dll) into a picture
I am trying to use R2WinBUGS using this example: code (Please only consider the
I have spend hours trying to figure the following out... I have the following

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.