Here are my classes:
public class XDetail
{
public string Name { get; set; }
public int ID { get; set; }
}
public class X
{
public int XID { get; set; }
public int ID { get; set; }
}
The ID is shared between them to link X and XDetail (one to many relationship) and X and XDetail are really typed DataRows. I read in a file using the following linq query and shape an anonymous type:
var results = (from line in File.ReadAllLines(file)
select new
{
XID = int.Parse(line.Substring(0, 8).TrimStart('0')),
Name = line.Substring(8, 255).Trim()
}).ToList();
This data is used to check against existing X/XDetail to make appropriate changes or add new records. I wrapping the results in a check to see if it throws on the .ToList() when the sequence has no results. XList is a List and XDetailList is a List.
From there I attempt a fancy linq query to match up the appropriate items:
var changedData = from x in XList
join xDetail in XDetailList on x.ID equals xDetail.ID
where
(!results.Any(p => p.XID.Equals(x.XID))
|| !results.Any(p => p.Name.Equals(xDetail.Name)))
select new
{
XValue = x,
XDetailValue = xDetail,
Result = (from result in results
where result.Name.Equals(xDetail.Name)
select result).SingleOrDefault()
};
My new problem is that this query will only provide me with what has changed in X/XDetail and not what is new. To accomplish getting what is new I have to run another query which seemed fine enough while testing on small data sets (3 existing entries of X/XDetail), but when I attempted the real file and went to churn through it’s ~7700 entries I seem to have endless processing.
For a sample data set of the following already contained in X/XDetail:
XID: 1, Name: Bob, ID: 10
XID: 2, Name: Joe, ID: 20
XID: 3, Name: Sam, ID: 30
With a results file containing:
XID: 2, Name: Bob2
XID: 3, Name: NotSam
XID: 4, Name: NewGuy
XID: 5, Name: NewGuy2
I’d like to be able to get a result set containing:
{XID: 2, Name: Bob2}, x, xDetail
{XID: 3, Name: NotSam}, x, xDetail
{XID: 4, Name: NewGuy}, x, xDetail
{XID: 5, Name: NewGuy2}, x, xDetail
I’d like the x and xDetail as part of the result set so that I can use those typed data rows to make the necessary changes.
I tried my hand at making such a query:
var newData = from result in results
join x in XList on result.XID equals x.XID
join xDetail in XDetailList on x.ID equals xDetail.ID
where
(x.XID == result.XID && xDetail.Name != result.Name)
select new
{
XValue = x,
XDetailValue = xDetail,
Result = result
};
As the joins indicate I’m only ever going to get the changed items in the data, I really want to be able to add in that data that isn’t in X/XDetail and stop my system that has been processing my ~7700 change file for the past 2.5 hours. I feel like I have stared at this and related queries too long to be able to spot what I should be doing to shape a where clause correctly for it.
Is there a way to structure the linq query to find the changed data and the data that does not exist in X/XDetail and return that into a new result set to process?
I think your performaces problems are related to the complexity of your queries, that are maybe around
O(n^2).Hence, first I suggest you to set the current data in a lookup structure, like this (*):
Now, I’m not sure, but I assume that by saying “changed data” you mean a list of entries having XID already existing but a new name, is it right?
If so, you can get “changed data” using this query:
Then, if by “new data” you mean a list of entries having new XID (XID not currently present in XList/XDetailList), well you cannot match them with X/Xdetail elements because, well there aren’t any, so that’s simply:
(*)
Actually, to be even faster, you could arrange your data in a dictionary of dictionary, where the outer key is XID and the inner key is the Name.