To keep my code cleaner I often try to break down parts of my data access code in LINQ to SQL into private sub-methods, just like with plain-old business logic code. Let me give a very simplistic example:
public IEnumerable<Item> GetItemsFromRepository()
{
var setA = from a in this.dataContext.TableA
where /* criteria */
select a.Prop;
return DoSubquery(setA);
}
private IEnumerable<Item> DoSubQuery(IEnumerable<DateTimeOffset> set)
{
return from item in set
where /* criteria */
select item;
}
I’m sure no one’s imagination would be stretched by imagining more complex examples with deeper nesting or using results of sets to filter other queries.
My basic question is this: I’ve seen some significant performance differences and even exceptions being thrown by just simply reorganizing LINQ to SQL code in private methods. Can anyone explain the rules for these behaviors so that I can make informed decisions about how to write efficient, clean data access code?
Some questions I’ve had:
1) When does passage of System.Linq.Table instace to a method cause query execution?
2) When does using a System.Linq.Table in another query cause execution?
3) Are there limits to what types of operations (Take, First, Last, order by, etc.) can be applied to System.Linq.Table passed a parameters into a method?
Remove the implicit cast:
The implicit cast from
IQueryable<Item>toIEnumerable<Item>is essentially the same as callingAsEnumerable()on yourIQueryable<Item>. There are of course times when you want that, but you should leave things asIQueryableby default, so that the entire query can be performed on the database, rather than merely theGetItemsFromRepository()bit with the rest being done in memory.The secondary questions:
When something needs a final result, such as
Max(),ToList(), etc. that is neither a queryable object, nor a loaded-as-it-goes enumerable.Note though, that while
AsEnumerable()does not cause query execution, it does mean that when execution does happen only that before theAsEnumerable()will be performed against the source datasource, this will then produce an on-demand in-memory datasource against which the rest will be performed.The same as above.
Table<T>implementsIQueryable<T>. If you e.g. join two of them together, that won’t yet cause anything to be executed.Those that are definted by
IQueryable<T>.Edit: Some clarification on the differences and similarities between
IEnumerableandIQueryable.Just about anything you can do on an
IQueryableyou can do on anIEnumerableand vice-versa, but how it’s performed will be different.Any given
IQueryableimplementation can be used in linq queries and will have all the linqy extension methods likeTake(),Select(),GroupByand so on.Just how this is done, depends on the implementation. For example,
System.Linq.Data.Tableimplements those methods by the query being turned into an SQL query, the results of which are turned into a objects on a as-loaded basis. So ifmySourceis a table then:Gets turned into SQL like:
And then an enumerator is created from that such that on each call to
MoveNext()another row is read from the results, and a new anonymous object created from it.On the other hand, if
mySourcewhere aListor aHashSet, or anything else that implementsIEnumerable<T>but doesn’t have its own query engine, then the linq-to-objects code will turn it into something like:Which is about as efficiently as that code could be done in memory. The results will be the same, but the way to get them, would be different:
Now, since all
IQueryable<T>can be converted into the equivalentIEnumerable<T>we could, if we wanted to, take the firstmySource(where execution happens in a database) and do the following instead:Here, while there is still nothing executed against the database until we iterate through the results or call something that examines all of those results, once we do so, it’s as if we split the execution into two separate steps:
The implemenatation of the first line would be to execute the SQL
SELECT * FROM mySourceTable, and the execution of the rest would be like the linq-to-objects example above.It’s not hard to see how, if the database contained 10 items with an id < 23, and 50,000 items with an id higher, this is now much, much less performant.
As well as offering the explicity
AsEnumerable()method, allIQueryable<T>can be implicitly cast toIEnumerable<T>. This lets us doforeachon them and use them with any other existing code that handlesIEnumerable<T>, but if we accidentally do it at in inappropriate time, we can make queries much slower, and this is what was happening when yourDoSubQuerywas defined to take anIEnumerable<DateTimeOffset>and return anIEnumerable<Item>; it implicitly calledAsEnumerable()on yourIQueryable<DateTimeOffset>and yourIQueryable<Item>and caused what could have been performed on the database to be performed in-memory.For this reason, 99% of the time, we want to keep dealing in
IQueryableuntil the very last moment.As an example of the opposite though, just to point out that
AsEnumerable()and the casts toIEnumerable<T>aren’t there out of madness, we should consider two things. The first is thatIEnumerable<T>lets us do things that can’t be done otherwise, such as joining two completely different sources that don’t know about each other (e.g. two different databases, a database and an XML file, etc.)Another is that sometimes
IEnumerable<T>is actually more efficient too. Consider:Here
groupingQueryis set up as a queryable that does some grouping, but which hasn’t been executed in anyway. When we create list1, then first we create a newIQueryablebased on that, and the query engine does it’s best to work out what the best SQL for it is, and comes up with something like:Which is pretty efficiently performed. Then the rows are turned into objects, which are then put into a list.
On the other hand, with the second query, there isn’t as natural a SQL conversion for a
group bythat doesn’t perform aggregate methods on all of the non-grouped items, so the best the query engine can come up with is to first do:And then for every name it receives, to do:
And so on, should this mean 2 SQL queries, or 200,000.
In this case, we’re much better working on
mySource.AsEnumerable()because here it is more efficient to grab the whole table into memory first. (Even better still would be to work onmySource.Select(item => new {item.ID, item.Name}).AsEnumerable()because then we still only retrieve the columns we care about from the database, and switch to in-memory at that point).The last bit is worth remembering because it breaks our rule that we should stay with
IQueryable<T>as long as possible. It isn’t something to worry about much, but it is worth keeping an eye on if you do grouping and find yourself with a very slow query.