I am trying to create a 2D list in Python. I found two possibilities.
def cArray(size):
c = [[0. for i in range(size)] for j in range(size)]
return c
def npArray(size):
np = numpy.zeros((size,size))
return np
Now both of these functions give the correct answer. The problem here is with the performance. I ran both of these using timeit, and here are my results:
list size is 5000
number of times run is 5
cArray average time: 3.73241295815
npArray average time: 0.210782241821
So obviously, I would like to avoid the first solution, especially since this will be running for sizes up to 100k. However, I also do not want to use too many dependencies. Is there a way for me to efficiently create a 2D array, without numpy? It doesn’t need to be exactly up to speed, as long as it’s not 17 times slower.
You must choose which of these is more important to you. Numpy has better performance precisely because it doesn’t use the builtin Python types and uses its own types that are optimized for numerical work. If your data are going to be numeric and you’re going to have 100k rows/columns, you will see a gigantic performance increase with numpy. If you want to avoid the numpy dependency, you will have to live with reduced performance. (Obviously you can always write your own Python libraries or C extensions to optimize for your particular use case, but these will then be dependencies like any other.)
Personally I would recommend you just use numpy. It is so widely used that anyone who is considering using a library that deals with 100k multidimensional arrays probably already has numpy installed.