I created a collection in MongoDB consisting of 11446615 documents.
Each document has the following form:
{
"_id" : ObjectId("4e03dec7c3c365f574820835"),
"httpReferer" : "http://www.somewebsite.pl/art.php?id=13321&b=1",
"words" : ["SEX", "DRUGS", "ROCKNROLL", "WHATEVER"],
"howMany" : 3
}
httpReferer: just an url
words: words parsed from the url above. Size of the list is between 15 and 90.
I am planning to use this database to obtain list of webpages which have similar content.
I ‘ll by querying this collection using words field so I created (or rather started creating) index on this field:
db.my_coll.ensureIndex({words: 1})
I started creating index about 3 hours ago and it doesn’t seem like it could finish in another 3 hours.
How can I increase speed of indexing? Or maybe I should use completely another approach to this problem? Any ideas are welcome 🙂
Nope, indexing is slow for large collections. You can create the indexing in the background as well:
db.my_coll.ensureIndex({words:1}, {background:true});Creating the index in the background will be slower and result in a larger index. However, it won’t be used until the indexing is complete, so in the meantime you’ll be able to use the database normally and the indexing won’t block.