I’m using LINQPad to learn LINQ and I’ve run into a stumbling block.
The goal is to get a list of Network Ids, Network Names and how many Stations each has.
Here is my original SQL:
SELECT n.iStationId AS NetworkID, n.sPrettyName AS NetworkName, COUNT(s.iStationID) AS StationCount
FROM T_StationInfo AS s, T_StationInfo as n
WHERE s.iNetworkId = n.iStationId
GROUP BY n.sPrettyName, n.iStationId
ORDER BY COUNT(s.iStationID) DESC
Here is my LINQ:
from s in T_stationInfo
from n in T_stationInfo
where s.INetworkID == n.IStationID
group s by s.INetworkID into stations
orderby stations.Count(x => x.INetworkID == stations.Key) descending
select new {
NetworkId = stations.Key,
NetworkName = T_stationInfo.Single(x => x.IStationID == stations.Key).SPrettyName,
StationCount = stations.Count(x => x.INetworkID == stations.Key)
};
LINQ takes 5 times longer to execute. I’m looking at the SQL that the linq statement generates and it pulls in the t_stationInfo table 7 times.
I believe this is because I am misusing LINQ but I don’t see where or how.
What LINQ statement would create equivalent SQL or, at least, SQL that isn’t so poor performing?
A couple notes:
- The structure of the table/database can not be changed.
- This question is more about learning to use LINQ than getting the list of ids, names, and counts.
- I do appreciate it! 🙂
–EDIT–
Just to clarify the structure:
Each row in the table is an entity that has various information (name, contact, etc) and can have a parent. Those parents are also in the table. In this case parents can’t have parents. Their parent field is NULL or 0.
So to get the Name of the Parent of a Station(called Network in the table), I pull the station info table in twice and join the parent id (network id) to the entity id (station id) so that on a single row I have the station’s info and the parent’s info. Hence the two froms of the same table.
Did that make sense?
–EDIT2–
This is the sql generated by the original LINQ query:
SELECT [t2].[iNetworkID] AS [NetworkId], (
SELECT [t5].[sPrettyName]
FROM [t_stationInfo] AS [t5]
WHERE (CONVERT(Decimal(29,4),[t5].[iStationID])) = [t2].[iNetworkID]
) AS [NetworkName], (
SELECT COUNT(*)
FROM [t_stationInfo] AS [t6], [t_stationInfo] AS [t7]
WHERE ([t6].[iNetworkID] = [t2].[iNetworkID]) AND ([t2].[iNetworkID] = [t6].[iNetworkID]) AND ([t6].[iNetworkID] = (CONVERT(Decimal(29,4),[t7].[iStationID])))
) AS [StationCount]
FROM (
SELECT [t0].[iNetworkID]
FROM [t_stationInfo] AS [t0], [t_stationInfo] AS [t1]
WHERE [t0].[iNetworkID] = (CONVERT(Decimal(29,4),[t1].[iStationID]))
GROUP BY [t0].[iNetworkID]
) AS [t2]
ORDER BY (
SELECT COUNT(*)
FROM [t_stationInfo] AS [t3], [t_stationInfo] AS [t4]
WHERE ([t3].[iNetworkID] = [t2].[iNetworkID]) AND ([t2].[iNetworkID] = [t3].[iNetworkID]) AND ([t3].[iNetworkID] = (CONVERT(Decimal(29,4),[t4].[iStationID])))
) DESC
I don’t how big of an impact this will have on your performance, if any. But when I look at your query I see one function declared twice:
Does using a
letclause improve performance at all?I feel like there should also be a better way to assign the NetworkName property, but I’m not sure.
Oh, and sorry for renaming the variables. I changed
stostationandntonetworkto help me follow it a little better.