I recently started a WPF application. I connected that to a BaseX (XML-based) database and retrieved about one million entries from it. I wanted to iterate over the entries, calculate something for each entry and then write that back to the database:
IEnumerable<Result> resultSet = baseXClient.Query("...", "database");
foreach (Result result in resultSet)
{
...
}
The problem: The inside of the foreach is never reached. the Query() method returns pretty fast, but when the foreach is reached C# seems to do SOMETHING with the collection, the code is not continuing for a very very long time (at least 10 minutes, never let it run any longer).
What’s going on here?
I tried to limit the number of items retrieved. When retrieving 100.000 results, the same thing occurs but the code continues after about 10-20 seconds. When retrieving the full one million results, C# seems to be stuck forever…
Any ideas?
regards
Edit: Why this is happening
As some of you pointed out, the reason for this behavior seems to be that the query is actually only evaluated when MoveNext() on the Enumerator inside the Enumerable is called. My database seems unable to return one value at a time, but instead returns the entire one million dataset at once. I will try to switch to another database (Apache Lucene, if possible, as it has good fulltext search support) and edit this post to let you know if it changed anything.
PS: Yes, I am aware that one million results is a lot. This is not meant for live usage, it is just a step for preparing the data. While I didn’t expect the code to run in a few seconds, I was still surprised to see SUCH poor performance in the database.
Edit: The Solution So I migrated the XML database to Apache Lucine. Works like a charm! Of course Lucine is a text-based database that is not suitable for every use case, but for me it worked wonders. Can iterate over one million entries in a few seconds, one entry per loop is fetched – works extremly well!
One million of anything is a lot… so any operation that obtains that many items is expected to take significant amount of time. It looks like library you use does not defer retrieving items till it is absolutely necessary – so you see impact of getting all items hidden behind “foreach” statement.
What happens:
“foreach” is not a single operation, but rather several calls on IEnumerable and IEnumerator: IEnumerable.GetEnumerator, repeated calls to IEnumerator.MoveNext.
First call
GetEnumeratormay be implemented with deferred execution (most common way how LINQ queries are written) or immediate execution (which seem to be the case of your collection.Calls to MoveNext could also trigger immediate execution of whole query even if you are asking just for single item or each call can get just single item. I.e. most LINQ queries get just one next item from iterator.