I am using the following query
var queryList1Only = (from file in list1
select file).Except(list2, myFileCompare);
while myFileCompare does a comparison of 2 files based on the name and length.
The query was returning the results if the list1 and list2 were small (say 100 files while I tested), then I increased the list1 to 30,000 files and list2 to 20,000 files and the query now says "Function Evaluation Timed Out".
I searched online and found debugging could cause it, so I removed all the breakpoints and ran the code, now the program just froze, without any output for queryList1Only I am trying to print out to check it.
EDIT:
This is the code for myFileCompare
class FileCompare : System.Collections.Generic.IEqualityComparer<System.IO.FileInfo>
{
public FileCompare() { }
public bool Equals(System.IO.FileInfo f1, System.IO.FileInfo f2)
{
return (f1.Name == f2.Name && f1.Directory.Name == f2.Directory.Name &&
f1.Length == f2.Length);
}
// Return a hash that reflects the comparison criteria. According to the
// rules for IEqualityComparer<T>, if Equals is true, then the hash codes must
// also be equal. Because equality as defined here is a simple value equality, not
// reference identity, it is possible that two or more objects will produce the same
// hash code.
public int GetHashCode(System.IO.FileInfo fi)
{
string s = String.Format("{0}{1}", fi.Name, fi.Length);
return s.GetHashCode();
}
}
What are you need to do with the items returned by a query?
Basically such heavy operations would be great to execute simultaneously in a separate thread to avoid the situations you’ve just faced.
EDIT: An idea
As a case you can try following algorithm:
QuickSort(List<T>.Sort()uses it by default), it will be pretty fast with good implementation ofGetHashCode()for()loop traverse list and compare elements with the same indexI believe with sorted arrays you’ll give much better performance. I believe complexity of Except() is O(m*n).
EDIT: An other idea, should be really fast
Set<T>Set<T>, it would be VERY fast! Basically O(mlogm) + O(n) because you need to traverse only single array and search within a set with good hash function (useGetHashCode()I’ve provided with an updated logic) is very quick. Try it out!EDIT: More details regarding equality logic were provided in comments
Try out this impelmentation
Useful links: