I need to write a query pulling distinct values from columns defined by a user for any given data set. There could be millions of rows so the statements must be as efficient as possible. Below is the code I have.
What is the order of this LINQ query? Is there a more efficient way of doing this?
var MyValues = from r in MyDataTable.AsEnumerable()
orderby r.Field<double>(_varName)
select r.Field<double>(_varName);
IEnumerable result= MyValues.Distinct();
I can’t speak much to the
AsEnumerable()call or the field conversions, but for the LINQ side of things, theorderbyis a stable quick sort and should beO(n log n). If I had to guess, everything but theorderbyshould beO(n), so overall you’re still justO(n log n).Update: the LINQ
Distinct()call should also beO(n).So altogether, the Big-Oh for this thing is still
O(Kn log n), where K is some constant.