I have a very large graph stored in a single dimensional array (about 1.1 GB) which I am able to store in memory on my machine which is running Windows XP with 2GB of ram and 2GB of virtual memory. I am able to generate the entire data set in memory, however when I try to serialize it to disk using the BinaryFormatter, the file size gets to about 50MB and then gives me an out of memory exception. The code I am using to write this is the same I use amongst all of my smaller problems:
StateInformation[] diskReady = GenerateStateGraph();
BinaryFormatter bf = new BinaryFormatter();
using (Stream file = File.OpenWrite(@"C:\temp\states.dat"))
{
bf.Serialize(file, diskReady);
}
The search algorithm is very lightweight, and I am able to perform searches on this graph with no problems once it is in memory.
I really have 3 questions:
-
Is there a more reliable way to
write a large data set to disk. I
guess you can define large as when
the size of the data set approaches
the amount of available memory,
though I am not sure how accurate
that is. -
Should I move to a more database
centric approach? -
Can anyone point me to some
literature on reading portions of a
large data set from a disk file in
C#?
Write entries to file yourself. One simple solution would be like:
No more than the memory for a single StateInformation object’s memory is needed at a time, and to deserialise you read four bytes, construct the length, create a buffer of that size, fill it, and deserialise.
All of the above could be seriously optimised for speed, memory use and disk-size if you create a more specialised format, but the above goes to show the principle.