Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 7546131
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 30, 20262026-05-30T09:07:15+00:00 2026-05-30T09:07:15+00:00

I have an expensive function that takes and returns a small amount of data

  • 0

I have an expensive function that takes and returns a small amount of data (a few integers and floats). I have already memoized this function, but I would like to make the memo persistent. There are already a couple of threads relating to this, but I’m unsure about potential issues with some of the suggested approaches, and I have some fairly specific requirements:

  • I will definitely use the function from multiple threads and processes simultaneously (both using multiprocessing and from separate python scripts)
  • I will not need read or write access to the memo from outside this python function
  • I am not that concerned about the memo being corrupted on rare occasions (like pulling the plug or accidentally writing to the file without locking it) as it isn’t that expensive to rebuild (typically 10-20 minutes) but I would prefer if it would not be corrupted because of exceptions, or manually terminating a python process (I don’t know how realistic that is)
  • I would strongly prefer solutions that don’t require large external libraries as I have a severely limited amount of hard disk space on one machine I will be running the code on
  • I have a weak preference for cross-platform code, but I will likely only use this on Linux

This thread discusses the shelve module, which is apparently not process-safe. Two of the answers suggest using fcntl.flock to lock the shelve file. Some of the responses in this thread, however, seem to suggest that this is fraught with problems – but I’m not exactly sure what they are. It sounds as though this is limited to Unix (though apparently Windows has an equivalent called msvcrt.locking), and the lock is only ‘advisory’ – i.e., it won’t stop me from accidentally writing to the file without checking it is locked. Are there any other potential problems? Would writing to a copy of the file, and replacing the master copy as a final step, reduce the risk of corruption?

It doesn’t look as though the dbm module will do any better than shelve. I’ve had a quick look at sqlite3, but it seems a bit overkill for this purpose. This thread and this one mention several 3rd party libraries, including ZODB, but there are a lot of choices, and they all seem overly large and complicated for this task.

Does anyone have any advice?

UPDATE: kindall mentioned IncPy below, which does look very interesting. Unfortunately, I wouldn’t want to move back to Python 2.6 (I’m actually using 3.2), and it looks like it is a bit awkward to use with C libraries (I make heavy use of numpy and scipy, among others).

kindall’s other idea is instructive, but I think adapting this to multiple processes would be a little difficult – I suppose it would be easiest to replace the queue with file locking or a database.

Looking at ZODB again, it does look perfect for the task, but I really do want to avoid using any additional libraries. I’m still not entirely sure what all the issues with simply using flock are – I imagine one big problem is if a process is terminated while writing to the file, or before releasing the lock?

So, I’ve taken synthesizerpatel’s advice and gone with sqlite3. If anyone’s interested, I decided to make a drop-in replacement for dict that stores its entries as pickles in a database (I don’t bother to keep any in memory as database access and pickling is fast enough compared to everything else I’m doing). I’m sure there are more efficient ways of doing this (and I’ve no idea whether I might still have concurrency issues), but here is the code:

from collections import MutableMapping
import sqlite3
import pickle


class PersistentDict(MutableMapping):
    def __init__(self, dbpath, iterable=None, **kwargs):
        self.dbpath = dbpath
        with self.get_connection() as connection:
            cursor = connection.cursor()
            cursor.execute(
                'create table if not exists memo '
                '(key blob primary key not null, value blob not null)'
            )
        if iterable is not None:
            self.update(iterable)
        self.update(kwargs)

    def encode(self, obj):
        return pickle.dumps(obj)

    def decode(self, blob):
        return pickle.loads(blob)

    def get_connection(self):
        return sqlite3.connect(self.dbpath)

    def  __getitem__(self, key):
        key = self.encode(key)
        with self.get_connection() as connection:
            cursor = connection.cursor()
            cursor.execute(
                'select value from memo where key=?',
                (key,)
            )
            value = cursor.fetchone()
        if value is None:
            raise KeyError(key)
        return self.decode(value[0])

    def __setitem__(self, key, value):
        key = self.encode(key)
        value = self.encode(value)
        with self.get_connection() as connection:
            cursor = connection.cursor()
            cursor.execute(
                'insert or replace into memo values (?, ?)',
                (key, value)
            )

    def __delitem__(self, key):
        key = self.encode(key)
        with self.get_connection() as connection:
            cursor = connection.cursor()
            cursor.execute(
                'select count(*) from memo where key=?',
                (key,)
            )
            if cursor.fetchone()[0] == 0:
                raise KeyError(key)
            cursor.execute(
                'delete from memo where key=?',
                (key,)
            )

    def __iter__(self):
        with self.get_connection() as connection:
            cursor = connection.cursor()
            cursor.execute(
                'select key from memo'
            )
            records = cursor.fetchall()
        for r in records:
            yield self.decode(r[0])

    def __len__(self):
        with self.get_connection() as connection:
            cursor = connection.cursor()
            cursor.execute(
                'select count(*) from memo'
            )
            return cursor.fetchone()[0]
  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-30T09:07:16+00:00Added an answer on May 30, 2026 at 9:07 am

    sqlite3 out of the box provides ACID. File locking is prone to race-conditions and concurrency problems that you won’t have using sqlite3.

    Basically, yeah, sqlite3 is more than what you need, but it’s not a huge burden. It can run on mobile phones, so it’s not like you’re committing to running some beastly software. It’s going to save you time reinventing wheels and debugging locking issues.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I have a table that looks something like this: word big expensive smart fast
I have a Python function that takes a numeric argument that must be an
I have some expensive function f(x) that I want to only calculate once, but
I have a function that performs a very expensive computational task on a click
I have heard that DateTime.Now is very expensive call (from here ) Is GETDATE()
I have a object with properties that are expensive to compute, so they are
I have a dataset in MySQL where using limit is already an expensive query,
I want to write a function that takes an int between 1 and 64,
I have a function that gives me a number as result: Now to make
I modified a function that returns a strongly typed Ilist of products (from a

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.