Does anyone have experience with storing data on disk? What I have is an in-memory modelling application that can do calculations etc. Basically the data is stored as lists of objects, that have nested key-value collections like Dictionary< int, Dictionary< int, T>>.
Right now I use SQL-Server as a persistance layer but I use very few features of it. So I’m thinking I could write/ read the data to disk myself to reduce dependencies and ease installation.
So I wrote a little routine that writes each array to disk in roughly this format, where the words “ObjId”, “Type”, “Valid” and “Count” are not actually in the file, they’re the 1st, 2nd, 3rd an 4th int in the byte[], then come < int, T > pairs. The 52 comes from 4 * 4 + 3 * (4 + 8). (4 bytes for int, 8 for double)
Bytes: 52
ObjId: 123
Valid: 234
Type: double
Count: 3
1 .23
2 .34
3 .45
In real life there’s no indentation etc, they’re all sequential bytes in a long stream.
This is fine, to write once. But when I want to write an extra value somewhere in the middle I have to rewrite the whole thing. Also I can’t update a single value easily.
One alternative is to write each object to a separate file so I would only have to rewrite that. But but that seems quite inefficient because I get files that are 1kb, but 4kB on disk so I’d be wasting space there.
So what do I need to to do, to be able to incrementenally write to this file on disk? I know SqlServer has ‘pages’ where it writes data, is that the way to go?
Is there any library ready to go for this type of problem? Maybe some virtual file that will let me treat them as seperate byte[] but handles the storage as a single psysical file? Ideally compressed.. (pushing it, but who knows.. I’ve been surprised before 🙂
Thanks in advance,
Gert-Jan
If you don’t want the overhead of an RDBMS, you could use a key-value database like Berkeley DB. There is a C# interface for it here:
Berkeley DB for .NET
You can have one entry for each array, and just rewrite that when you need to. The rest of the database file will be unchanged so it’s much faster than rewriting the whole file.
You can reuse the serialization logic you’ve already implemented when you write out an array. All you need to add is a unique key for each array.