I have two entities, User and UserPermission. The User entity contains all your normal fields, Id, Username, Email, etc and the UserPermission entity has two values, UserId and PermissionId. I have written a repository method GetUserWithPermissions that originally utilized the Include extension and did something like this:
return dbContext.Users.Include(u => u.UserPermission).Where(u => u.Username.Equals(username)).FirstOrDefault();
It works great but the issues is that there are going to be a bunch of UserPermission entities associated with any given user and using the Include extension essentially just flattens the two tables into one so ALL of the user fields are repeated for every single UserPermission associated with a User. The returned data looks something like this:
Id Username Email ... PermissionId
1 johndoe john@email.com 1
1 johndoe john@email.com 2
1 johndoe john@email.com 3
1 johndoe john@email.com 4
1 johndoe john@email.com 5
1 johndoe john@email.com 6
1 johndoe john@email.com 7
The only difference between each row is the last column PermissionId. If we have 50 permissions defined for the user, that is a large chunk of repeated data being returned when I do not think it is necessary. Obviously my other option is to do something like this:
User user = dbContext.Users.Where(u => u.Username.Equals(username)).FirstOrDefault();
if (user != null)
user.UserPermissions.ToList();
return user;
The above code accomplishes the same thing with drastically less data being returned but with the trade off that two trips are being made to the database.
Which method is better? Returning a lot of repeated data or making two trips to the database?
Here is the SQL query that is generated by the Entity Framework
SELECT
[Project2].[Id] AS [Id],
[Project2].[Username] AS [Username],
[Project2].[LoweredUsername] AS [LoweredUsername],
[Project2].[CompanyId] AS [CompanyId],
[Project2].[FirstName] AS [FirstName],
[Project2].[LastName] AS [LastName],
[Project2].[Email] AS [Email],
[Project2].[C1] AS [C1],
[Project2].[UserId] AS [UserId],
[Project2].[PermissionValue] AS [PermissionValue]
FROM ( SELECT
[Limit1].[Id] AS [Id],
[Limit1].[Username] AS [Username],
[Limit1].[LoweredUsername] AS [LoweredUsername],
[Limit1].[CompanyId] AS [CompanyId],
[Limit1].[FirstName] AS [FirstName],
[Limit1].[LastName] AS [LastName],
[Limit1].[Email] AS [Email],
[Extent2].[UserId] AS [UserId],
[Extent2].[PermissionValue] AS [PermissionValue],
CASE WHEN ([Extent2].[PermissionValue] IS NULL) THEN CAST(NULL AS int) ELSE 1 END AS [C1]
FROM (SELECT TOP (1)
[Extent1].[Id] AS [Id],
[Extent1].[Username] AS [Username],
[Extent1].[LoweredUsername] AS [LoweredUsername],
[Extent1].[CompanyId] AS [CompanyId],
[Extent1].[FirstName] AS [FirstName],
[Extent1].[LastName] AS [LastName],
[Extent1].[Email] AS [Email]
FROM [dbo].[Users] AS [Extent1]
WHERE [Extent1].[LoweredUsername] = (LOWER(LTRIM(RTRIM(@p__linq__0)))) ) AS [Limit1]
LEFT OUTER JOIN [dbo].[UserPermissions] AS [Extent2] ON [Limit1].[Id] = [Extent2].[UserId]
) AS [Project2]
ORDER BY [Project2].[Id] ASC, [Project2].[C1] ASC
Thanks
Nick
It’s the way how it works.
Includeof collections leads indeed to duplication of the columns of the parent entity (see here for great example and explanation: How many Include I can use on ObjectSet in EntityFramework to retain performance?)And you have a trade-off without a general rule which way is better: One roundtrip with
Includebut duplicated data or two roundtrips without duplicated data. What is better/more performant? I think you have to measure it case by case if you want an exact answer.I could imagine that as a rule of thumb we could say: If the parent has many columns and the child collection only a few and the child collection can possibly be very long, then this is a candidate to prefer two roundtrips to avoid the data duclication.
If you don’t want eager loading with
Includeyou can either rely on lazy loading or you can use explicite loading: