I have a collection in my MongoDB database that had Mongoid::Versioning enabled for it quite sometime ago. Unfortunately, it made some of my documents extremely large in size. I see some that are over 711K. This makes for expensive disk i/o and expensive read/write times. I am looking for a solution to go through this collection (which has almost 2 million documents), and remove all mongoid versioning, safely if possible. From what I can tell, Mongoid just stores the versions in an array attribute named just that, versions. If there is a way to gank it from all of my documents in a way that won’t completely make the database unusable (in terms of performance while I do an entire disk scan + write/update), that would be great.
I have a collection in my MongoDB database that had Mongoid::Versioning enabled for it
Share
There is a lot of ways to handle this situation. I’ve tried this a couple of different ways, and for an trial of ten thousand records they have similar processing times. I’ve tried another and found it much worse. I’ll attach them here in case it helps.
Here I am working on the hypothesis that batching the process up will help alleviate the impact on your database.
The first method would be to do finds on the collection, with a limit to handle a batch.
The issue here will be the collection scans that will be required on every new batch. The limit will help with the impact, but it is still costly on the collection.
A second method would be to fill an array with the
_ids of all the documents which have aversionsarray, then iterate through the array and update:This will mean an initial full collection scan, but after this point it is all iterating through the array and updating in batches.
I’ve tried a third method, where I work through the collection finding an
_idgreater than the previous one and updating, but found this to be much more expensive (even though it was able to use the index on_id). I’m adding it here in case it is useful.