Requirements:
- I need to grow an array arbitrarily large from data.
- I can guess the size (roughly 100-200) with no guarantees that the array will fit every time
- Once it is grown to its final size, I need to perform numeric computations on it, so I’d prefer to eventually get to a 2-D numpy array.
- Speed is critical. As an example, for one of 300 files, the update() method is called 45 million times (takes 150s or so) and the finalize() method is called 500k times (takes total of 106s) … taking a total of 250s or so.
Here is my code:
def __init__(self):
self.data = []
def update(self, row):
self.data.append(row)
def finalize(self):
dx = np.array(self.data)
Other things I tried include the following code … but this is waaaaay slower.
def class A:
def __init__(self):
self.data = np.array([])
def update(self, row):
np.append(self.data, row)
def finalize(self):
dx = np.reshape(self.data, size=(self.data.shape[0]/5, 5))
Here is a schematic of how this is called:
for i in range(500000):
ax = A()
for j in range(200):
ax.update([1,2,3,4,5])
ax.finalize()
# some processing on ax
I tried a few different things, with timing.
The method you mention as slow: (32.094 seconds)
Regular ol Python list: (0.308 seconds)
Trying to implement an arraylist in numpy: (0.362 seconds)
And this is how I timed it:
So it looks like regular old Python lists are pretty good 😉