I am trying to work with PyTables and NumPy.
Can you please tell me how much data the latter can handle?
I am currently handling data of 140 million rows and would like to know if NumPy can handle it. It would be nice if it could at least handle 140 million rows of 2 columns. Right now i use a 64-bit version of Windows with 8 GB of RAM.
If NumPy cannot handle this amount of data, what are the possible alternatives for statistics and machine learning algorithmic implementation?
140M is much less than 2**31, so this should even fit in a 32-bit Python/Numpy given sufficient memory. You can easily try this out with
The memory use with the standard
dtype=np.float64is on the order of 8 bytes × 140M × 2 = 2GB. If you usedtype=np.float32you can save a factor 2.