There is a Mongo collection with >5 Million items. I need to get a “representation” (held in a variable, or put into a file on disk, anything at this point) of a single attribute of all of the ‘documents’.
My query is something like this:
cursor = db.collection.find({"conditional_field": {"subfield": True}}, {"field_i_want": True})
My first, silly, attempt was to Pickle ‘cursor’, but I quickly realized it doesn’t work like that.
In this case, “field_i_want” contains an Integer. And as an example of something I’ve tried, I did this, and practically locked up the server for several minutes:
ints = [i['field_i_want'] for i in cursor]
… to just get a list of the integers. This hogged CPU resources on the server for far too long.
Is there a remotely simple way to retrieve these results into a list, tuple, set, pickle, file, something, that won’t totally hog the cpu?
Ideally I could dump the results to be read back in later. But I’d like to be as kind as possible while dumping them.
I think that streaming the results is likely to help here:
Don’t hold everything in memory if you don’t have to.