This seems to be like a common use case… but somehow I cannot get it working.
I’m attempting to use MongoDB as an enumeration store with unique items. I’ve created a collection with a byte[] Id (the unique ID) and a timestamp (a long, used for enumeration). The store is quite big (terabytes) and distributed among different servers. I am able to re-build the store from scratch currently, since I’m still in the testing phase.
What I want to do is two things:
- Create a unique id for each item that I insert. This basically means that if I insert the same ID twice, MongoDB will detect this and give an error. This approach seems to work fine.
- Continuously enumerate the store for new items by other processes. The approach I took was to add a second index to InsertID and used a high precision timestamp on this along with the server id and a counter (just to make it unique and ascending).
In the best scenario this would mean that the enumerator would keep track of an index cursor for every server. From what I’ve learned from mongodb query processing I expected this behavior. However, when I try to execute the code (below) it seems to take forever to get anything.
long lastid = 0;
while (true)
{
DateTime first = DateTime.UtcNow;
foreach (var item in collection.FindAllAs<ContentItem>().OrderBy((a)=>(a.InsertId)).Take(100))
{
lastid = item.InsertId;
}
Console.WriteLine("Took {0:0.00} for 100", (DateTime.UtcNow - first).TotalSeconds);
}
I’ve read about cursors, but am unsure if they fulfill the requirements when new items are inserted into the store.
As I said, I’m not bound to any table structure or something like that… the only things that are important is that I can get new items over time and without getting duplicate items.
-Stefan.
Somehow I figured it out… more or less…
I created the query manually and ended up with something like this:
The LINQ query I put in the question doesn’t generate this query. After some digging in the code I found that it should be this LINQ query:
The AsQueryable() seems to be the key to execute the rewriting of LINQ to MongoDB queries.
This gives results, but still they appeared to be slow (4 secs for 10 results, 30 for 100). However, when I added ‘explain()’ I noticed ‘0 millis’ in the query execution.
I stopped the process doing bulk inserts and tada, it works, and fast. In other words: the issues I was having were due to the locking behavior of MongoDB, and due to the way I interpreted the linq implementation. Since the former is the result of initial bulk-filling the data store, this means that the problem is solved.
On the ‘negative’ part of the solution: I would have preferred a solution that involved serializable cursors or something like that… this ‘take’ solution has to iterate the b-tree over and over again. If someone has an answer for this, please let me know.
-Stefan.