I would like to know what is the best solution for storing large amount of images on multiple servers like google, facebook.
It seems that storing in filesystem is better then inside a database but what about using a noSQL DB like cassandra.
Do Google/Facebooke store the same image in multiple servers for the load balancing.
How does it work? What is the best solution?
Thx a lot
There’s nothing wrong with the approach you’re taking. As mentioned, there are caveats, however, the possibilities do exist, and a lot of people and companies are successfully storing files in Apache Cassandra.
The principal behind this is to take a file, break it into a set of chunks and store those chunks as columns in a row. When retrieving, pull each column, reassemble the file and voila.
Cassandra FAQ: large file and blog storage
Lucene indexes in Cassandra
You’ll get more positive feedback on the Cassandra mailing list and on the IRC channel.
Finally, this is from 2009, and written by folks at Facebook, which should go some way to help answer more of the fundamental questions you have: Cassandra – A Decentralized Structured Storage System.