I work with mathematica 8.0.1.0 on a Windows7 32bit platform. I try to import data with
Import[file,”Table”]
which works fine as long as the file (the array in the file) is small enough. But for bigger files(38MB)/array(9429 times 2052) I get the message:
No more memory available. Mathematica kernel has shut down. Try quitting other applications and then retry.
On my Windows7 64bit platform with more main memory I can import bigger files, but I think that I will have there the same problem one day when the file has grown/the array has more rows.
So, I try to find a solution to import big files. After searching for some time, I have seen here a similar question: Way to deal with large data files in Wolfram Mathematica.
But it seems that my mathematica knowledge is not good enough to adapt the suggested OpenRead, ReadList or similar to my data (see here the example file).
The problem is that I need for the rest of my program information of the array in the file, such as Dimensions, Max/Min in some columns and rows, and I am doing operations on some columns and every row.
But when I am using e.g. ReadList, I never get the same information of the array as I have got with Import (probably because I am doing it in the wrong way).
Could somebody here give me some advice? I would appreciate every support!
For some reason, the current implementation of
Importfor the typeTable(tabular data) is quite memory – inefficient. Below I’ve made an attempt to remedy this situation somewhat, while still reusing Mathematica’s high-level importing capabilities (throughImportString). For sparse tables, a separate solution is presented, which can lead to very significant memory savings.General memory-efficient solution
Here is a much more memory – efficient function:
Here I confront it with the standard
Import, for your file:You can see that my code is about 10 times more memory-efficient than
Import, while being not much slower. You can control the memory consumption by adjusting thechunkSizeparameter. Your resulting table occupies about 150 – 200 MB of RAM.EDIT
Getting yet more efficient for sparse tables
I want to illustrate how one can make this function yet 2-3 times more memory-efficient during the import, plus another order of magnitude more memory-efficient in terms of final memory occupied by your table, using
SparseArray-s. The degree to which we get memory efficiency gains depends much on how sparse is your table. In your example, the table is very sparse.The anatomy of sparse arrays
We start with a generally useful API for construction and deconstruction of
SparseArrayobjects:Some brief comments are in order. Here is a sample sparse array:
(I used
ToString–ToHeldExpressioncycle to convertList[...]etc in theFullFormback to{...}for the ease of reading). Here,{3,5}are obviously dimensions. Next is0, the default element. Next is a nested list, which we can denote as{1,{ic,jr}, sparseData}. Here,icgives a total number of nonzero elements as we add rows – so it is first 0, then 2 after first row, the second adds 2 more, and the last adds 3 more. The next list,jr, gives positions of non-zero elements in all rows, so they are3and5for the first row,1and5for the second, and2,4and5for the last one. There is no confusion as to where which row starts and ends here, since this can be determined by theiclist. Finally, we have thesparseData, which is a list of the non-zero elements as read row by row from left to right (the ordering is the same as for thejrlist). This explains the internal format in whichSparseArray-s store their elements, and hopefully clarifies the role of the functions above.The code
Benchmarks and comparisons
Here is the starting amount of used memory (fresh kernel):
We call our function:
So, it is the same speed as
readTable. How about the memory usage?I think, this is quite remarkable: we only ever used twice as much memory as is the file on disk occupying itself. But, even more remarkably, the final memory usage (after the computation finished) has been dramatically reduced:
This is because we use the
SparseArray:So, our table takes only 12 MB of RAM. We can compare it to our more general function:
The results are the same once we convert our sparse table back to normal:
while the normal table occupies vastly more space (it appears that
ByteCountovercounts the occupied memory about 3-4 times, but the real difference is still at least order of magnitude):