Does the placement of the columns to join matter for performance when executing a LINQ statement?
For example, which of the following queries would run the quickest, and why?
A)
var query = from o in entities.orders
join i in entities.order_items
on o.OrderId equals i.OrderId
where o.AddedSalesOrder == 0
select new
{
i.ShippingFirstName,
i.ShippingLastName,
i.Sku,
i.Quantity,
i.ItemPrice,
o.TotalShippingCost,
o.OrderId,
o.OrderCreateDate
};
B)
var query = from o in entities.orders
join i in entities.order_items
on o.OrderId equals i.OrderId
where o.AddedSalesOrder == 0
select new
{
o.TotalShippingCost,
o.OrderId,
o.OrderCreateDate,
i.ShippingFirstName,
i.ShippingLastName,
i.Sku,
i.Quantity,
i.ItemPrice
};
C)
var query = from o in entities.orders
join i in entities.order_items
on o.OrderId equals i.OrderId
where o.AddedSalesOrder == 0
select new
{
o.OrderCreateDate,
i.ShippingFirstName,
i.ShippingLastName,
o.TotalShippingCost,
o.OrderId,
i.Sku,
i.Quantity,
i.ItemPrice
};
I am expecting query B to be the most efficient due to the placement of the columns for the join resulting in cleaner SQL code being generated but I may be wrong.
If it matters, the queries are being run on a SQL server 2008r2 database.
–Edit–
For what its worth, I ran a quick (and surely non-absolute) benchmark via C# to see how performance behaved on each scenario. My findings are below:
a) 297.61 millisecond avg over 100000 iterations
b) 245.90 millisecond avg over 100000 iterations
c) 304.16 millisecond avg over 100000 iterations
The code I used to test this is as follows:
var sw = new Stopwatch();
List<long> totalTime = new List<long>();
for (int u = 0; u < 100000; u++)
{
sw.Start();
var entities = new Entities();
var query = from o in entities.orders
join i in entities.order_items
on o.OrderId equals i.OrderId
where o.AddedSalesOrder == 1
select new
{
i.ShippingFirstName,
i.ShippingLastName,
i.Sku,
i.Quantity,
i.ItemPrice,
o.TotalShippingCost,
o.OrderId,
o.OrderCreateDate
};
var qc = query.Count();
sw.Stop();
totalTime.Add(sw.ElapsedMilliseconds);
sw.Reset();
}
Console.WriteLine("Average time in Milliseconds: {0}", totalTime.Average());
It appears that the ordering of the joined columns may impact the speed of execution – or as was pointed out, my database may be inefficient 🙂
At any rate, I wanted to post the findings for any who find this interesting.
In SQL, the order of joins and columns usually does not matter: provided you have a good SQL optimizer, and you have good statistics on your database, then the database engine will restructure your query for maximum performance.
In general, that is not true for LINQ: unlike SQL, the statements are not reordered for execution, but instead lazily performed in the same order they are typed. If you’re grabbing spatially separated data, or picking a bad merge order, your execution speed will suffer.
The good news is that you should be safe. For LINQ to SQL or LINQ to Entities, while the SQL generated will (usually) be in roughly the same order as you typed it, you’ll still be hitting the SQL database’s optimization engine. In this case, the order of joins and column names generally will not matter.
As always, bad statistics or a poor database optimizer can still bite you. In this case, rather than ask on StackOverflow, your best bet is going to be to check what query plans are actually getting used by breaking out SQL Profiler.