I’ve recently run into issues when creating Numpy object arrays using e.g.
a = np.array([c], dtype=np.object)
where c is an instance of some complicated class, and in some cases Numpy tries to access some methods of that class. However, doing:
a = np.empty((1,), dtype=np.object)
a[0] = c
solves the issue. I’m curious as to what the difference is between these two internally. Why in the first case might Numpy try and access some attributes or methods of c?
EDIT: For the record, here is example code that demonstrates the issue:
import numpy as np
class Thing(object):
def __getitem__(self, item):
print "in getitem"
def __len__(self):
return 1
a = np.array([Thing()], dtype='object')
This prints out getitem twice. Basically if __len__ is present in the class, then this is when one can run into unexpected behavior.
In the first case
a = np.array([c], dtype=np.object), numpy knows nothing about the shape of the intended array.For example, when you define
Then you expect numpy to determine the shape based on the length of
d.So similarly in your case, numpy will attempt to see if
len(c)is defined, and if it is, to access the elements ofcviac[i].You can see the effect by defining a class such as
Then
produces
In contrast, in your second case
Then the shape of
ahas already been determined. Thus numpy can just directly assign the object.However to an extent this is true only since
ais a vector. If it had been defined with a different shape then method accesses will still occur. The following for example will still call___getitem__on a classreturns