I’m testing a few NoSQL solution and I’m focusing mainly on read performance. Today was MongoDb day.
The test machine is a VM with a Quad Core Xeon @2.93GHz and 8GB of RAM.
I’m testing with only database and a single collection with ~100.000 documents. The BSON document size is around 20Kb, more or less.
The managed object I’m working with is:
private class Job
{
public int Id { get; set; }
public string OrganizationName { get; set; }
public List<string> Categories { get; set; }
public List<string> Industries { get; set; }
public int Identifier { get; set; }
public string Description { get; set; }
}
The test process:
-Create 100 threads.
-Start all threads.
-Each thread reads 20 random documents from the collection.
Here’s the select method I’m using:
private static void TestSelectWithCursor(object state)
{
resetEvent.WaitOne();
MongoCollection jobs = (state as MongoCollection);
var q = jobs.AsQueryable<Job>();
Random r = new Random(938432094);
List<int> ids = new List<int>();
for (int i = 0; i != 20; ++i)
{
ids.Add(r.Next(1000, 100000));
}
Stopwatch sw = Stopwatch.StartNew();
var subset = from j in q
where j.Id.In(ids)
select j;
int count = 0;
foreach (Job job in subset)
{
count++;
}
Console.WriteLine("Retrieved {0} documents in {1} ms.", count, sw.ElapsedMilliseconds);
ThreadsCount++;
}
The “count++” stuff is just to pretend I’m doing something after retrieving the cursor, so please ignore that.
Anyway, the idea is that I get what seems to me to be very slow read times. This is a typical test result:
> 100 threads created.
>
> Retrieved 20 documents in 272 ms. Retrieved 20 documents in 522 ms.
> Retrieved 20 documents in 681 ms. Retrieved 20 documents in 732 ms.
> Retrieved 20 documents in 769 ms. Retrieved 20 documents in 843 ms.
> Retrieved 20 documents in 1038 ms. Retrieved 20 documents in 1139 ms.
> Retrieved 20 documents in 1163 ms. Retrieved 20 documents in 1170 ms.
> Retrieved 20 documents in 1206 ms. Retrieved 20 documents in 1243 ms.
> Retrieved 20 documents in 1322 ms. Retrieved 20 documents in 1378 ms.
> Retrieved 20 documents in 1463 ms. Retrieved 20 documents in 1507 ms.
> Retrieved 20 documents in 1530 ms. Retrieved 20 documents in 1557 ms.
> Retrieved 20 documents in 1567 ms. Retrieved 20 documents in 1617 ms.
> Retrieved 20 documents in 1626 ms. Retrieved 20 documents in 1659 ms.
> Retrieved 20 documents in 1666 ms. Retrieved 20 documents in 1687 ms.
> Retrieved 20 documents in 1711 ms. Retrieved 20 documents in 1731 ms.
> Retrieved 20 documents in 1763 ms. Retrieved 20 documents in 1839 ms.
> Retrieved 20 documents in 1854 ms. Retrieved 20 documents in 1887 ms.
> Retrieved 20 documents in 1906 ms. Retrieved 20 documents in 1946 ms.
> Retrieved 20 documents in 1962 ms. Retrieved 20 documents in 1967 ms.
> Retrieved 20 documents in 1969 ms. Retrieved 20 documents in 1977 ms.
> Retrieved 20 documents in 1996 ms. Retrieved 20 documents in 2005 ms.
> Retrieved 20 documents in 2009 ms. Retrieved 20 documents in 2025 ms.
> Retrieved 20 documents in 2035 ms. Retrieved 20 documents in 2066 ms.
> Retrieved 20 documents in 2093 ms. Retrieved 20 documents in 2111 ms.
> Retrieved 20 documents in 2133 ms. Retrieved 20 documents in 2147 ms.
> Retrieved 20 documents in 2150 ms. Retrieved 20 documents in 2152 ms.
> Retrieved 20 documents in 2155 ms. Retrieved 20 documents in 2160 ms.
> Retrieved 20 documents in 2166 ms. Retrieved 20 documents in 2196 ms.
> Retrieved 20 documents in 2202 ms. Retrieved 20 documents in 2254 ms.
> Retrieved 20 documents in 2256 ms. Retrieved 20 documents in 2262 ms.
> Retrieved 20 documents in 2263 ms. Retrieved 20 documents in 2285 ms.
> Retrieved 20 documents in 2326 ms. Retrieved 20 documents in 2336 ms.
> Retrieved 20 documents in 2337 ms. Retrieved 20 documents in 2350 ms.
> Retrieved 20 documents in 2372 ms. Retrieved 20 documents in 2384 ms.
> Retrieved 20 documents in 2412 ms. Retrieved 20 documents in 2426 ms.
> Retrieved 20 documents in 2457 ms. Retrieved 20 documents in 2473 ms.
> Retrieved 20 documents in 2521 ms. Retrieved 20 documents in 2528 ms.
> Retrieved 20 documents in 2604 ms. Retrieved 20 documents in 2659 ms.
> Retrieved 20 documents in 2670 ms. Retrieved 20 documents in 2687 ms.
> Retrieved 20 documents in 2961 ms. Retrieved 20 documents in 3234 ms.
> Retrieved 20 documents in 3434 ms. Retrieved 20 documents in 3440 ms.
> Retrieved 20 documents in 3452 ms. Retrieved 20 documents in 3466 ms.
> Retrieved 20 documents in 3502 ms. Retrieved 20 documents in 3524 ms.
> Retrieved 20 documents in 3561 ms. Retrieved 20 documents in 3611 ms.
> Retrieved 20 documents in 3652 ms. Retrieved 20 documents in 3655 ms.
> Retrieved 20 documents in 3666 ms. Retrieved 20 documents in 3711 ms.
> Retrieved 20 documents in 3742 ms. Retrieved 20 documents in 3821 ms.
> Retrieved 20 documents in 3850 ms. Retrieved 20 documents in 4020 ms.
> Retrieved 20 documents in 5143 ms. Retrieved 20 documents in 6607 ms.
> Retrieved 20 documents in 6630 ms. Retrieved 20 documents in 6633 ms.
> Retrieved 20 documents in 6637 ms. Retrieved 20 documents in 6639 ms.
> Retrieved 20 documents in 6801 ms. Retrieved 20 documents in 9302 ms.
The bottom line is that I was expecting to get much faster read times than this. I’m still thinking I’m doing something wrong.
Not sure what other information I can provide now, but if anything is missed then please let me know.
I am also including, hoping that it’ll help, the explain() trace on one of the queries that is executed by the test:
{
"cursor" : "BtreeCursor _id_ multi",
"nscanned" : 39,
"nscannedObjects" : 20,
"n" : 20,
"millis" : 0,
"nYields" : 0,
"nChunkSkips" : 0,
"isMultiKey" : false,
"indexOnly" : false,
"indexBounds" : {
"_id" : [
[
3276,
3276
],
[
8257,
8257
],
[
11189,
11189
],
[
21779,
21779
],
[
22293,
22293
],
[
23376,
23376
],
[
28656,
28656
],
[
29557,
29557
],
[
32160,
32160
],
[
34833,
34833
],
[
35922,
35922
],
[
39141,
39141
],
[
49094,
49094
],
[
54554,
54554
],
[
67684,
67684
],
[
76384,
76384
],
[
85612,
85612
],
[
85838,
85838
],
[
91634,
91634
],
[
99891,
99891
]
]
}
}
If you have any idea, then I’ll be most anxious to read it.
Thank you in advance!
Marcel
I suspect that the “in” (Generic Modifier) is forcing a sequential scan with full extraction of each document to check the where clause, bypassing the efficiency of using the _id index. Given that the random numbers can be quite distributed, my guess is that each thread/query is scanning essentially the full database.
I suggest trying a couple of things.
(1) Query individually for each of the 20 docs by individual single id
(2) Consider using a MongoCursor and use Explain to get information about index use for your query
Blessings,
-Gary
P.S. The thread times seem to indicate that there are also some thread scheduling effects at work.