I am developing a web system to handle a very large set of small images, about 100 millions images of 50kb ~ 200kb, working on ReiserFS.
For now, it is very difficult to backup and sync those large number of small files.
My question is that if it a good idea to store these small images to a key/value store or other nosql database such as GridFS (Mongodb), Tokyo Tyrant, Voldemort to gain more performance and bring better backup support?
First off, have a look at this: Storing a millon images in the filesystem. While it isn’t about backups, it is a worthwile discussion of the topic at hand.
And yes, large numbers of small files are pesky; They take up inodes, require space for filenames &c. (And it takes time to do backup of all this meta-data). Basically it sounds like you got the serving of the files figured out; if you run it on
nginx, with avarnishin front or such, you can hardly make it any faster. Adding a database under that will only make things more complicated; also when it comes to backing up. Alas, I would suggest working harder on a in-place FS backup strategy.First off, have you tried
rsyncwith the-az-switches (archive and compression, respectively)? They tend to be highly effective, as it doesn’t transfer the same files again and again.Alternately, my suggestion would be to tar + gz into a number of files. In pseudo-code (and assuming you got them in different sub-folders):
This will create a number of .tar.gz-files that are easily transferred without too much overhead.