I’ve tested NoSQL databases like CouchDB, MongoDB and Cassandra and observed tendence to absorbing very large amount of drive space relative to inserted key-value pairs.
When comparing CouchDB and MySQL schemaless databases CouchDB is consuming much more drive space than MySQL.
I know about that key-value DBs by default are versioning and have long uuid and need key optimalisation – the comparison was between about 15 mln rows in MySQL and 1-5 mln documents listed NoSQL DB’s.
My question is : Is there any NoSQL with good compaction / compression of data?
So that I can have NoSQL database with a size closer to 5GB than 50GB?
MongoDB has a “database repair” function that also performs a compaction. However, such a compaction is not going to happen while the DB is running.
But if DB space is a serious issue, then try setting up a MongoDB master/slave pair. As the data needs compaction, run the repair on the slave, allow it to “catch up” and then switch them over. You can now safely compact the master instead.
But I have to echo jbellis‘s comment: you will probably need more space and most of these products are making the assumption that disk space is (relatively) cheap. If disk space is really tight, then you’ll find that MongoDB is reasonably sized, but it’s going to have a difficult time competing with tabular CSV data.
Think of it this way, what’s more space efficient?
Obviously the JSON is going to be longer b/c you’re repeating the field names every time. The only exception here is a CSV file with like 100 columns of which only a few are filled for each row. (but that’s probably not your data)