In python is the process of opening a file handler slow compared to the actual write time?
One large file, one file handler
import marshal
bigDataStructure = [[1000000.0 for j in range(1000000)] for i in range(1000000)]
f = open('bigFile' , 'w')
marshal.dump(bigDataStructure , f)
f.close()
multiple smaller files, multiple file handlers
for i , row in enumerate(bigDataStructure):
f = open(str(i) , 'w'):
marshal.dump(row , f)
f.close()
You mention running out of memory if you merge them all — that’s a lot of neurons. (At least, in my experience, several hundred neurons were sufficient for the toy programs we wrote in my CS courses.)
You probably don’t want to be creating 100,000 separate files to store 100,000 separate neurons — and certainly not 1,000,000 files to store 1,000,000 neurons. The IO overhead of directory lookups, file open, read, close, and small amounts of IO will drastically weigh on loading and saving with non-trivial amounts of neurons.
Of course, if you’re thinking 50 or 100 neurons, then it’ll go quickly no matter what, and perhaps the simplest implementation is called for.
But if this were mine, I’d look hard at trying to build good data-structures for the neurons: perhaps all your neurons in a given level can be described by an integer to select neuron type and an array of integers or doubles to describe per-neuron characteristics, and a list of these level descriptions could be easily written to separate files or a single file, whichever is easier.
If your neurons change types within a level, or aren’t fully connected between levels, you might find some sparse matrix storage designs useful for a larger data structure that can describe all the neurons at once.
Maybe the real question should be “how do I improve the storage of my neurons?”
Update
I think even 10,000 neurons justifies making a “combined” storage format. I just created 10,000 tiny files, dropped my caches to test a cold-start, and then re-read each one individually. It took 14.6 seconds to read in 10,000 files. It took only .1 seconds to read in a single file containing the same data as the 10,000 files.
If your network starts “cold” once a year or so, maybe it won’t matter much. But if your network starts cold a dozen times per day, you might grow to resent the simpler storage format.