Is there a NoSQL (or other type of) database suitable for storing a large number (i.e. >1 billion) of “medium-sized” blobs (i.e. 20 KB to 2 MB). All I need is a mapping from A (an identifier) to B (a blob), the ability to retrieve “B” given A, a consistent external API for access, and the ability to “just add another computer” to scale the system.
Something simpler than a database, e.g. a distributed key-value system, may just fine, and I’d appreciate any thoughts along that vein as well.
Thank you for reading.
Brian
If your API requirements are purely along the lines of “Get(key), Put(key,blob), Remove(key)” then a key-value store (or more accurately a “Persistent distributed hash table”) is exactly what you are looking for.
There a quite a few of these available, but without additional information it is hard to make a solid recommendation – What OS are you targeting? Which language(s) are you developing with? What are the I/O characteristics of your app (cold/immutable data such as images? high write loads aka tweets?)
Some of the KV systems worth looking into:
– MemcacheDB
– Berkeley DB
– Voldemort
You may also want to look into document stores such as CouchDB or RavenDB*. Document Stores are similar to KV stores but they understand the persistence format (usually JSON) so they can provide additional services such as indexing.