In my C# Class Library project I have a method that needs to compute some statistics GetFaultRate, that, given a date, computes the number of products with faults over the number of products produced.
float GetFaultRate(DateTime date)
{
var products = GetProducts(date);
var faultyProducts = GetFaultyProducts(date);
var rate = (float) (faultyProducts.Count() / products.Count());
return rate;
}
Both methods, GetProducts and GetFaultyProducts take the data from a Repository class _productRepository.
IEnumerable<Product> GetProducts(DateTime date)
{
var products = _productRepository.GetAll().ToList();
var periodProducts = products.Where(p => CustomFunction(p.productionDate) == date);
return periodProducts;
}
IEnumerable<Product> GetFaultyProducts(DateTime date)
{
var products = _productRepository.GetAll().ToList();
var periodFaultyProducts = products.Where(p => CustomFunction(p.ProductionDate) == date && p.Faulty == true);
return periodFaultyProducts;
}
Where GetAll has signature:
IQueryable<Product> GetAll();
The products in the database are many and it takes a lot of time to retrieve them and convert ToList(). I need to enumerate the collection since any custom function such as CustomFunction, cannot be executed in a IQueryable<T>.
My application gets stuck for a long time before obtaining the fault rate. I guess it is because of the large number of objects to be retrieved. I can indeed remove the two functions GetProducts and GetFaultyProducts and implement the logic inside GetFaultRate. However since I have other functions that use GetProducts and GetFaultyProducts, with the latter solution I have only one access to the database but a lot of duplicate code.
What can be a good compromise?
First off, don’t convert the
IQueryableto a list. It forces the entire data set to be brought into memory all at once, rather than just callingWheredirectly on the query which will allow you to filter the data as it comes in. This will substantially decrease your memory footprint, and (very) marginally increase the runtime speed. If you need to convert anIQueryableto anIEnumerableso that theWhereisn’t executed by the database simply useAsEnumerable.Next, getting all of the data is something you should avoid if at all possible, especially multiple times. You’d need to show us what your date function does, but it’s possible that it is something that could be done on the database. Any filtering you can do at all at the database will substantially increase performance.
Next, you really don’t need two queries here. The second query is just a subset of the first, so if you know that you’ll always be using both queries then you should just just perform the first query, bring the results into memory (i.e. with a
ToListthat you store) and then use aWhereon that to filter the results further. This will avoid another database trip as well as all of the data processing/filtering.If you won’t always be using both queries, but will sometimes use just one or the other, then you can improve the second query by filtering out on
Faultybefore getting all items. AddWhere(p => p.Faulty)before you callAsEnumerableand filter on the date information after callingAsEnumerable(and that’s if you can’t convert any of the date filtering to filtering that can be done at the database).It appears that in the end you only need to compute the ratio of items that are faulty as compared to the total. That can easily be done with a single query, rather than two.
You’ve said that
Countis running really slowly in your code, but that’s not really true.Countis simply the method that is actually enumerating your query, whereas all of the other methods were simply building the query, not executing it. However, you can cut your performance costs drastically by combining the queries entirely.