I am looking for a fast (as in huge performance, not quick fix) solution for persisting and retrieving tens of millions of small (around 1k) binary objects. Each object should have a unique ID for retrieval (preferably, a GUID or SHA). Additional requirements is that it should be usable from .NET and it shouldn’t require additional software installation.
Currently, I am using an SQLite database with a single table for this job, but I want to get rid of the overhead of processing simple SQL instructions like SELECT data FROM store WHERE id = id.
I’ve also tested direct filesystem persistency under NTFS, but the performance degrades very fast as soon as it reaches half a millions objects.
P.S. By the way, objects never need to be deleted, and the insertion rate is very, very low. In fact, every time an object changes a new version is stored and the previous version remains. This is actually a requirement to support time-traveling.
Just adding some additional information to this thread:
To BLOB or Not To BLOB: Large Object Storage in a Database or a Filesystem http://arxiv.org/abs/cs.DB/0701168
You may be able to lessen the performance problems of NTFS by breaking the object’s GUID identifier up into pieces and using them as directory names. That way, each directory only contains a limited number of subdirectories or files.
e.g. if the identifier is
aaaa-bb-cc-ddddeeee, the path to the item would bec:\store\aaaa\bbcc\dddd\eeee.dat, limiting each directory to no more than 64k subitems.