I have a class
- whose instances have attributes that are containers
- which themselves contain containers, each containing many items
- has an expensive initialization of these containers
I want to create copies of instances such that
- the container attributes are copied, rather than shared as references, but
- the containers within each container are not deeply copied, but are shared references
- a call to the class’s expensive
__init__()method is avoided if possible
For an example, let’s use the class SetDict, below, which, when creating an instance, initializes a dictionary-like data structure as an attribute, d. d stores integers as keys and sets as values.
import collections
class SetDict(object):
def __init__(self, size):
self.d = collections.defaultdict(set)
# Do some initialization; if size is large, this is expensive
for i in range(size):
self.d[i].add(1)
I would like to copy instances of SetDict, such that d is itself copied, but the sets that are its values are not deep-copied, and are instead only references to the sets.
For example, consider the following behavior currently for this class, where copy.copy doesn’t copy the attribute d to the new copy, but copy.deepcopy creates completely new copies of the sets that are values of d.
>>> import copy
>>> s = SetDict(3)
>>> s.d
defaultdict(<type 'set'>, {0: set([1]), 1: set([1]), 2: set([1])})
>>> # Try a basic copy
>>> t = copy.copy(s)
>>> # Add a new key, value pair in t.d
>>> t.d[3] = set([2])
>>> t.d
defaultdict(<type 'set'>, {0: set([1]), 1: set([1]), 2: set([1]), 3: set([2])})
>>> # But oh no! We unintentionally also added the new key to s.d!
>>> s.d
defaultdict(<type 'set'>, {0: set([1]), 1: set([1]), 2: set([1]), 3: set([2])})
>>>
>>> s = SetDict(3)
>>> # Try a deep copy
>>> u = copy.deepcopy(s)
>>> u.d[0].add(2)
>>> u.d[0]
set([1, 2])
>>> # But oh no! 2 didn't get added to s.d[0]'s set
>>> s.d[0]
set([1])
The behavior I’d like to see instead would be the following:
>>> s = SetDict(3)
>>> s.d
defaultdict(<type 'set'>, {0: set([1]), 1: set([1]), 2: set([1])})
>>> t = copy.copy(s)
>>> # Add a new key, value pair in t.d
>>> t.d[3] = set([2])
>>> t.d
defaultdict(<type 'set'>, {0: set([1]), 1: set([1]), 2: set([1]), 3: set([2])})
>>> # s.d retains the same key-value pairs
>>> s.d
defaultdict(<type 'set'>, {0: set([1]), 1: set([1]), 2: set([1])})
>>> t.d[0].add(2)
>>> t.d[0]
set([1, 2])
>>> # s.d[0] also had 2 added to its set
>>> s.d[0]
set([1, 2])
This was my first attempt to create a class that would do this, but it fails due to infinite recursion:
class CopiableSetDict(SetDict):
def __copy__(self):
import copy
# This version gives infinite recursion, but conveys what we
# intend to do.
#
# First, create a shallow copy of this instance
other = copy.copy(self)
# Then create a separate shallow copy of the d
# attribute
other.d = copy.copy(self.d)
return other
I’m not sure how to properly override the copy.copy (or copy.deepcopy) behavior to achieve this. I’m also not entirely sure if I should be overriding copy.copy or copy.deepcopy. How can I go about getting the desired copy behavior?
A class is a callable. When you call
SetDict(3),SetDict.__call__first calls the constructorSetDict.__new__(SetDict)and then calls the initializer__init__(3)on the return value of__new__if it’s an instance ofSetDict. So you can get a new instance ofSetDict(or any other class) without calling its initializer by just calling its constructor directly.After that, you have an instance of your type and you can simply add regular copies of any container attributes and return it. Something like this should do the trick.
__new__is a static method and requires the class to be constructed as its first argument. It should be as simple as this unless you’re overriding__new__to do something in which case you should show what it is so that this can be modified. Here’s the test code do demonstrate the behavior that you want.