I’m using MongoDB and we are really happy with this DB. But recently our client asked us for the database size in the future.
We know how to calculate this in a typical relational database, but we don’t have a long experience in production with this No-SQL database.
Things that we know:
- db.namecollections.stats() give us important information like, size(documents),avgObjSize(documents), storageSize, totalIndexSize
(more here)
With the size and totalIndexSize we can calculate the total size for the collection only, but the big question here is:
- Why is there a difference between collection size and storageSize???
How can one calculate this, thinking in the future database size?
MongoDB pads documents a bit so that they can grow a bit without having to be moved to the end of the collection on disk (an expensive operation).
Also, mongo pre-allocates data files by creating a the next one and filling it with zeros before it is needed to boost speed.
You can throw a –noprealloc flag on mongod to prevent that from hapening.
If you want more info you can look here
In regards to your question about calculating disk space 5 years out, if you can figure out an equation for the growth of your data, make some assumptions about what your average document size will be, and how many / what kinds of indexes you will have, you might be able to come up with something.
Having worked for a bank also, my suggestion would be to come up with an an insane upper bound and then quadruple it. Money is cheap inside a bank, calculation mistakes are not.