I have a model pretty much like this:
Document Release
-
has an embedded array ReleaseDetails[]
-
The ReleaseDetails array contains documents of type ReleaseDetails
-
A ReleaseDetails document has a field called ArtistName of type text
-
A ReleaseDetails document has a field called Type of type text
I basically want to do this:
retrieve all the Release documents that have an entry in their ReleaseDetails array which (both) has ArtistName=someRegexExpression AND Type=someOtherRegexExpression. Basically I do this:
db.getCollection("releasesCollection").
find({ "ReleaseDetails" :
{ "$elemMatch" :
{ "ArtistName" : {$regex:"^David"},
"Type" : {$regex:".*singer.*"}}})
Problem is, if I call explain() on such a query I can see that the indexes I’ve made on
ReleaseDetails.ArtistName and ReleaseDetails.Type are effectively not taken into account (the query just goes through all documents in the collection).
On the other hand, if I do the exact same query but replace the regex expressions with actual values, in other words, if I do this:
db.getCollection("releasesCollection").
find({ "ReleaseDetails" :
{ "$elemMatch" :
{ "ArtistName" : "David Halliday",
"Type" : "mainSinger"}}})
in this case the indexs ARE taken into account (explain() show that clearly).
My question is then, is there a way to have a query that does $elemMatch WITH regex take advantage of the indexes?
(I’m asking because I’ve also seen that in fact, if you do a regex query on a basic field (like a text field, not an embedded-array field) AND that field is indexed, my regex query will infact take advantage of the indexes. Why is it that regex query on a basic indexed fields uses the index but regex query on an embedded-array indexed field fails to use the indexes?)
Two important things you may missed:
1.Only case sensitive prefix regexp can use index in mongodb, all others – can’t.
For example following query will use index:
db.users.find({ "name": /^andrew/ })2.Any query can use only one index per query, therefore it will be better to create compound index for your query:
And to take advantages of mongodb indexes you should not use
likeregexp ->"Type" : {$regex:".*singer.*"}(probably because of this regexp your query not use index).If you really need
likesearch that you can tokenize yourselfTypeand store it as an array. For example:If you have following Type: “My favorite singer” you can:
[my, favorite, singer]likesearch like this:[my, fav, favo, favor, favori, favorit, favorite, avorite, vorite, orite, rite, ite,(i’ve skipped singer word tokenizing)avorit, vori]
About algorithms how to tokenize words you can read from full text search engines like lucene, sphinx