I have written a small wrapper for the dict built-in class that loads entries (values) of a dictionary from cPickled files as soon as the respective key is first accessed. When the dictionary is destroyed all loaded entries are written back to disk.
Now it would be convenient if I could check if any of the values has been changed and write out only those that in fact have been. My question therefore is: Does a dictionary know if a value has been changed? Or is there a clever way to transparently implement this?
For completeness I attach the code I use. It’s called with the path where the files are stored (keys are used as filenames) and with a list of keys for which files exist.
import cPickle
class DictDB(dict):
def __init__(self, path, folders):
self.picklepath = path # path to files on disk
self.folders = folders # available folders
self.loaded_folders = {}
def has_key(self, key):
return key in self.folders
def get(self, key):
if not key in self.loaded_folders.keys():
if not key in self.folders:
raise KeyError("Folder "+key+" not available")
# load from disk
self.loaded_folders[key] = cPickle.load(file(self.picklepath + key + ".cpickle2"))
return self.loaded_folders[key]
def __getitem__(self, key):
return self.get(key)
def close(self):
for folder in self.loaded_folders.keys():
# write back
cPickle.dump(self.loaded_folders[folder], file(picklepath + folder + '.cpickle2', 'w'), 2)
def __del__(self):
self.close()
I might approach it with a sort of publish-subscribe model, where the containing dictionary subscribes to each of the sub dictionaries (or other values). Then when one of them is edited, it notifies any dictionaries that contain it.
If you don’t want them all deal with the wiring for that and are willing to allow the containing dictionary to only check for changes on access or at set intervals, you can have each contained object keep track of a
versionnumber. Then when the containing dictionary is ready, it simply checks to see if that version number has changed.A final possibility would be to have a way of reliably calculating hash values for contained objects in place. This would let you write an external function and remove the need for the objects to track their own versions, but has its own complexities since you’ll either need to overload
__hash__on all of them or write another form of thehash()function that can identify the object and take some sort of intelligent hashed value from it