I have a Mongo collection where each document has a set of unique embedded keys:
{
Facebook :
{
Archived:'False' //non unique
'fan_count_December_19_2011':12345, //unique
'unique_views_count_December_19_2011':12345, //unique
'post_count_December_19_2011':12345, //unique
...
...
}
}
We look up these documents with the following query:
db.metrics.find({
{'Facebook.fan_count_December_19_2011' : {'$ne':null}},'Archived':'False'}
}
).limit(1)
The problem is, with 6,000 such documents, it’s a little slow. Looking at the Explain() log; each query takes on average 0.06 seconds to execute and it’s doing a full collection scan everytime.
Our service has to do the above query about 100 times (for 100 distinct keys); which at 0.06 p/s adds up to 6 seconds per call (not including the overhead of the site serving the data).
Sending all the keys over in one batch and doing one large query would require a major rewrite of the data layer; which I’m trying to avoid due to a tight deadline coming up.
I’ve been looking through the documentation, and there doesn’t seem to be a way to have a key-based index. The documentation says you can index on an embedded key; but that seems to only index the values. It also doesn’t do me much good; since each key in the system is unique; there’d have to be an index for each new key.
Short of re-designing our document structure (which would require a major change); is there anyway I can do to speed up this query against the existing collection in it’s current format?
Any constructive input is greatly appreciated.
Thanks,
Frank
Assuming you set your Archived field to true after you have processed a document, you could create an index on just the Archived field.
Normally you wouldn’t create an index on a field with low cardinality, but it might work for you in this case, but only on the assumption that there are not very many documents where the Archived field is false.
In the longer run you should redesign your document so you don’t have so many unique field names (something along the lines of Iain’s suggestion of a “Facebook.date” field). That way you have something you can create an index on.