Background
I have a SQL dataset that is called as a view via LINQ-to-Entities. It’s purpose is to provide outstanding account balances on a credit report that are 30 days outstanding, 60 days outstanding, and so on.
Providing for you a sample table is too difficult to format here on StackOverflow, but here is the SQL SELECT statement which should give you an idea of the original data structure:
SELECT TOP 1000 [TransactionId]
,[IndustrySector]
,[DataContributorId]
,[ExperienceMonth]
,[ExperienceMonthText]
,[Balance]
,[ARCurrent]
,[AR1to30PD]
,[AR31to60PD]
,[AR61to90PD]
,[Ar91PlusPD]
,[WeightedDTP]
FROM [BCC].[dbo].[vwTransactionExperienceDetail]
Now, when I call this view via LINQ, the ultimate goal is to construct an object that will be returned as JSON to the requesting client. The resulting object needs to be a hierarchy of groupings by Industry, then by Contributors (of the reported data), and finally of individual Reports. To do this, the following LINQ query works fine and is quite fast:
/// <summary>
/// Gets the 25 month experience detail report with summed parameters (balance, DTP, etc).
/// </summary>
/// <param name="id">The transaction id.</param>
/// <returns>List<ExperienceDetail></returns>
public static List<ExperienceDetail> Get25MonthExperienceDetail_Sum(int id)
{
var db = new BCCEntities();
return
db.vwTransactionExperienceDetails.Where(te => te.TransactionId == id)
.GroupBy(g => g.IndustrySector)
.Select(i => new ExperienceDetail
{
Industry = i.Key,
NumberOfContributors = i.GroupBy(c => c.DataContributorId).Count(),
Balance = i.Sum(s => s.Balance),
OneToThirty = i.Sum(s => s.ARCurrent),
ThirtyOneToSixty = i.Sum(s => s.AR1to30PD),
SixtyOneToNinety = i.Sum(s => s.AR31to60PD),
NinetyOneToOneTwenty = i.Sum(s => s.AR61to90PD),
OneTwentyOnePlus = i.Sum(s => s.Ar91PlusPD),
DTP = (i.Sum(s => s.Balance) != 0) ? i.Sum(s => s.WeightedDTP) / i.Sum(s => s.Balance) : i.Sum(s => s.WeightedDTP),
Contributions = i.GroupBy(dc => dc.DataContributorId).Select(c => new Contribution
{
Balance = c.Sum(s => s.Balance),
OneToThirty = c.Sum(s => s.ARCurrent),
ThirtyOneToSixty = c.Sum(s => s.AR1to30PD),
SixtyOneToNinety = c.Sum(s => s.AR31to60PD),
NinetyOneToOneTwenty = c.Sum(s => s.AR61to90PD),
OneTwentyOnePlus = c.Sum(s => s.Ar91PlusPD),
DTP = (c.Sum(s => s.Balance) != 0) ? c.Sum(s => s.WeightedDTP) / c.Sum(s => s.Balance) : c.Sum(s => s.WeightedDTP),
ContributorId = c.Key,
Reports = c.Select(r => new Report
{
DTP = (r.Balance != 0) ? r.WeightedDTP/r.Balance : r.WeightedDTP,
ReportDate = r.ExperienceMonth,
Balance = r.Balance,
OneToThirty = r.ARCurrent,
ThirtyOneToSixty = r.AR1to30PD,
SixtyOneToNinety = r.AR31to60PD,
NinetyOneToOneTwenty = r.AR61to90PD,
OneTwentyOnePlus = r.Ar91PlusPD,
ContributorId = r.DataContributorId,
Industry = i.Key
})
})
}).ToList();
}
The Problem
I need to create an additional service that provides the same data, but only for the most recent month reported by each contributor (DataContributorId). The following LINQ query works for this, but is EXTREMELY slow–it takes nearly a full minute to return the results:
/// <summary>
/// Gets an experience detail report with summed parameters (balance, DTP, etc) for the most recent month.
/// </summary>
/// <param name="id">The transaction id.</param>
/// <returns>List<ExperienceDetail></returns>
public static List<ExperienceDetail> Get25MonthExperienceDetail_MostRecentMonth(int id)
{
var db = new BCCEntities();
db.CommandTimeout = 100000;
return
db.vwTransactionExperienceDetails.Where(te => te.TransactionId == id)
.OrderByDescending(o => o.ExperienceMonth)
.GroupBy(g => g.IndustrySector)
.Select(i => new ExperienceDetail
{
Industry = i.Key,
NumberOfContributors = i.GroupBy(c => c.DataContributorId).Count(),
Balance = i.GroupBy(dc => dc.DataContributorId).Sum(x => x.Select(z => z.Balance).FirstOrDefault()),
OneToThirty = i.Sum(s => s.ARCurrent),
ThirtyOneToSixty = i.Sum(s => s.AR1to30PD),
SixtyOneToNinety = i.Sum(s => s.AR31to60PD),
NinetyOneToOneTwenty = i.Sum(s => s.AR61to90PD),
OneTwentyOnePlus = i.Sum(s => s.Ar91PlusPD),
DTP = (i.Sum(s => s.Balance) != 0) ? i.Sum(s => s.WeightedDTP) / i.Sum(s => s.Balance) : i.Sum(s => s.WeightedDTP),
Contributions = i.GroupBy(dc => dc.DataContributorId).Select(c => new Contribution
{
Balance = c.Take(1).Sum(s => s.Balance),
OneToThirty = c.Take(1).Sum(s => s.ARCurrent),
ThirtyOneToSixty = c.Take(1).Sum(s => s.AR1to30PD),
SixtyOneToNinety = c.Take(1).Sum(s => s.AR31to60PD),
NinetyOneToOneTwenty = c.Take(1).Sum(s => s.AR61to90PD),
OneTwentyOnePlus = c.Take(1).Sum(s => s.Ar91PlusPD),
DTP = (c.Take(1).Sum(s => s.Balance) != 0) ? c.Take(1).Sum(s => s.WeightedDTP) / c.Take(1).Sum(s => s.Balance) : c.Take(1).Sum(s => s.WeightedDTP),
ContributorId = c.Key,
Reports = c.Select(r => new Report
{
DTP = (r.Balance != 0) ? r.WeightedDTP / r.Balance : r.WeightedDTP,
ReportDate = r.ExperienceMonth,
Balance = r.Balance,
OneToThirty = r.ARCurrent,
ThirtyOneToSixty = r.AR1to30PD,
SixtyOneToNinety = r.AR31to60PD,
NinetyOneToOneTwenty = r.AR61to90PD,
OneTwentyOnePlus = r.Ar91PlusPD,
ContributorId = r.DataContributorId,
Industry = i.Key
}).Take(1)
})
}).ToList();
}
Question
How do I query for this “Most Recent Month Reported” result set without taking the performance hit? I have tried over the past several hours to isolate the part of the query that is taking the most time and I can’t seem to spot it. Admittedly, I don’t know how to effectively profile performance issues with complex LINQ queries and am open to comment.
Ultimately the question is: Is there an alternative to this LINQ query that will produce the same result set without such a severe performance penalty?
Thanks in advance.
Assuming the dataset is reasonably small, I’d just pull in all the months, go
ToList(), then filter out just the most recent month in in memory. LINQ can do some really strange things when the query gets complicated.