I generate a list of one dimensional numpy arrays in a loop and later convert this list to a 2d numpy array. I would’ve preallocated a 2d numpy array if i knew the number of items ahead of time, but I don’t, therefore I put everything in a list.
The mock up is below:
>>> list_of_arrays = map(lambda x: x*ones(2), range(5))
>>> list_of_arrays
[array([ 0., 0.]), array([ 1., 1.]), array([ 2., 2.]), array([ 3., 3.]), array([ 4., 4.])]
>>> arr = array(list_of_arrays)
>>> arr
array([[ 0., 0.],
[ 1., 1.],
[ 2., 2.],
[ 3., 3.],
[ 4., 4.]])
My question is the following:
Is there a better way (performancewise) to go about the task of collecting sequential numerical data (in my case numpy arrays) than putting them in a list and then making a numpy.array out of it (I am creating a new obj and copying the data)? Is there an “expandable” matrix data structure available in a well tested module?
A typical size of my 2d matrix would be between 100×10 and 5000×10 floats
EDIT: In this example i’m using map, but in my actual application I have a for loop
Suppose you know that the final array
arrwill never be larger than 5000×10.Then you could pre-allocate an array of maximum size, populate it with data as
you go through the loop, and then use
arr.resizeto cut it down to thediscovered size after exiting the loop.
The tests below suggest doing so will be slightly faster than constructing intermediate
python lists no matter what the ultimate size of the array is.
Also,
arr.resizede-allocates the unused memory, so the final (though maybe not the intermediate) memory footprint is smaller than what is used bypython_lists_to_array.This shows
numpy_all_the_wayis faster:This shows
numpy_all_the_wayuses less memory:test.py: