I was going to ask “How to pickle a class that inherits from dict and defines __slots__“. Then I realized the utterly mind-wrenching solution in class B below actually works…
import pickle
class A(dict):
__slots__ = ["porridge"]
def __init__(self, porridge): self.porridge = porridge
class B(A):
__slots__ = ["porridge"]
def __getstate__(self):
# Returning the very item being pickled in 'self'??
return self, self.porridge
def __setstate__(self, state):
print "__setstate__(%s) type(%s, %s)" % (state, type(state[0]),
type(state[1]))
self.update(state[0])
self.porridge = state[1]
Here is some output:
>>> saved = pickle.dumps(A(10))
TypeError: a class that defines __slots__ without defining __getstate__ cannot be pickled
>>> b = B('delicious')
>>> b['butter'] = 'yes please'
>>> loaded = pickle.loads(pickle.dumps(b))
__setstate__(({'butter': 'yes please'}, 'delicious')) type(<class '__main__.B'>, <type 'str'>)
>>> b
{'butter': 'yes please'}
>>> b.porridge
'delicious'
So basically, pickle cannot pickle a class that defines __slots__ without also defining __getstate__. Which is a problem if the class inherits from dict – because how do you return the content of the instance without returning self, which is the very instance pickle is already trying to pickle, and can’t do so without calling __getstate__. Notice how __setstate__ is actually receiving an instance B as part of the state.
Well, it works… but can someone explain why? Is it a feature or a bug?
Maybe I’m a bit late to the party, but this question didn’t get an answer that actually explains what’s happening, so here we go.
Here’s a quick summary for those who don’t want to read this whole post (it got a bit long…):
You don’t need to take care of the contained
dictinstance in__getstate__()—picklewill do this for you.If you include
selfin the state anyway,pickle‘s cycle detection will prevent an infinite loop.Writing
__getstate__()and__setstate__()methods for custom classes derived fromdictLet’s start with the right way to write the
__getstate__()and__setstate__()methods of your class. You don’t need to take care of pickling the contents of thedictinstance contained inBinstances —pickleknows how to deal with dictionaries and will do this for you. So this implementation will be enough:Example:
What’s happening in your implementation?
Why does your implementation work as well, and what’s happening under the hood? That’s a bit more involved, but — once we know that the dictionary gets pickled anyway — not too hard to figure out. If the
picklemodule encounters an instance of a user-defined class, it calls the__reduce__()method of this class, which in turn calls__getstate__()(actually, it usually calls the__reduce_ex__()method, but that does not matter here). Let’s defineBagain as you originally did, i.e. using the “recurisve” definition of__getstate__(), and let’s see what we get when calling__reduce__()for an instance ofBnow:As we can see from the documentation of
__reduce__(), the method returns a tuple of 2 to 5 elements. The first element is a function that will be called to reconstruct the instance when unpickling, the second element is the tuple of arguments that will be passed to this function, and the third element is the return value of__getstate__(). We can already see that the dictionary information is included twice. The function_reconstructor()is an internal function of thecopy_regmodule that reconstructs the base class before__setstate__()is called when unpickling. (Have a look at the source code of this function if you like — it’s short!)Now the pickler needs to pickle the return value of
a.__reduce__(). It basically pickles the three elements of this tuple one after the other. The second element is a tuple again, and its items are also pickled one after the other. The third item of this inner tuple (i.e.a.__reduce__()[1][2]) is of typedictand is pickled using the internal pickler for dictionaries. The third element of the outer tuple (i.e.a.__reduce__()[2]) is also a tuple again, consisting of theBinstance itself and a string. When pickling theBinstance, the cycle detection of thepicklemodule kicks in:picklerealises this exact instance has already been dealt with, and only stores a reference to itsid()instead of really pickling it — this is why no infinte loop occurs.When unpickling this mess again, the unpickler first reads the reconstruction function and its arguments from the stream. The function is called, resulting in an
Binstance with the dictionary part already initialised. Next, the unpickler reads the state. It encounters a tuple consisting of a reference to an already unpickled object — namely our instance ofB— and a string,"oats". This tuple now is passed toB.__setstate__(). The first element ofstateandselfare the same object now, as can be seen by adding the lineto your
__setstate__()implementation (it printsTrue!). The lineconsequently simply updates the instance with itself.