i have a mongo collection like this:
{
"A2_AboutMe": "",
"A2_Attributes": "|av|nv|",
"A2_Birthday": "",
"A2_DateCreated": "2010-11-25 22: 59: 00",
"A2_DateLast": "2011-11-18 12: 09: 36",
"A2_FK_A1_IDPerson": "0",
"A2_Firstname": "José Luis",
"A2_FirstnameC": "Jose Luis",
"A2_Gender": "m",
"A2_IDProfile": "1",
"A2_Keywords": "...|..",
"A2_Lastname": "test - test",
"A2_LastnameC": "_test test",
"A2_Locale": "",
"A2_Middlename": "",
"A2_Name": "José Luis test",
"A2_NameC": "Jose Luis test",
...
}
with indexies on A2_LastnameC and A2_FirstnameC
3.000.000 docs in this collection, 8 GB data storage
following query(PHP) in done in 3-4 sec
$collection->find(array(«A2_FirstnameC» => new MongoRegex("/jose/i")))->sort(array(«A2_LastnameC» => -1))->limit(10)
but sometimes the similar queries are done in less than 100 msec.
what can i do to get this performance each time?
test computer is i7, 8GB Ram(7 is used by mongo), Windows 7
First of all index won’t be used for non-prefix-like, case-insensitive regular expressions. But in the query above index can be used for sorting by
A2_LastnameCfield so this is fast. Now having the sorted data MongoDB will need to getA2_FirstnameCvalue and match it against the regexp stopping when there’s 10 matches ready (it will be also relatively fast because it will use index to retrieve the data instead of reading whole documents from disk). Depending on data order it can happen to match the first 10 documents – this is the best case and it will be very fast, the worst case would be the matches to occur on the last 10 docs having to scan all the previous index entries.How to speed this up? Either use query that can use index, like:
«A2_FirstnameC» => new MongoRegex("/^jose/"). Or you have to use some kind of full-text search. A simple way would be to split the field (A2_Firstnamein your case) into words, normalize them (convert to lower case, replace accents) and store as an array. Now an index for the array field will be used to do fast searches.