What is best practice, what delivers the best performance?
I currently have a query with many LEFT JOINs that fetches a user and all his data, like friends, friend requests, and so on:
SELECT
`user`.`id` AS `user_id`,
`user`.`name` AS `user_name`,
`manager`.`id` AS `manager_id`,
`competition`.`id` AS `manager_competition_id`,
`competition`.`name` AS `manager_competition_name`,
`competition`.`week` AS `manager_competition_week`,
`country`.`id` AS `manager_competition_country_id`,
`country`.`name` AS `manager_competition_country_name`,
`club_template`.`id` AS `manager_club_template_id`,
`club_template`.`name` AS `manager_club_template_name`,
`club`.`id` AS `manager_club_id`,
`club`.`name` AS `manager_club_name`,
`club`.`ready` AS `manager_club_ready`,
`friend`.`friend_id` AS `friend_id`,
`friend_user`.`name` AS `friend_name`
FROM
`users` AS `user`
LEFT JOIN
`managers` AS `manager`
ON
`manager`.`user_id` = `user`.`id`
LEFT JOIN
`competitions` AS `competition`
ON
`competition`.`id` = `manager`.`competition_id`
LEFT JOIN
`countries` AS `country`
ON
`country`.`id` = `competition`.`country_id`
LEFT JOIN
`club_templates` AS `club_template`
ON
`club_template`.`id` = `manager`.`club_template_id`
LEFT JOIN
`clubs` AS `club`
ON
`club`.`id` = `manager`.`club_id`
LEFT JOIN
`friends` AS `friend`
ON
`friend`.`user_id` = `user`.`id`
LEFT JOIN
`users` AS `friend_user`
ON
`friend_user`.`id` = `friend`.`friend_id`
WHERE
`user`.`id` = 1
As you can see, it’s a very big query. My reasoning behind this was that it’s better to have just one query that can be done in one API request, like this…
/api/users/1
…versus a few queries, each in their own API request, like this…
/api/users/1
/api/users/1/friends
/api/users/1/friend_requests
/api/users/1/managers
But now I’m worried, that since it’s become such a huge query that it will actually hurt performance more than to split it up in separate API requests.
What will scale better?
Update
I’ve changed the query to the full query. This is not the final query; I plan to add even more joins (or not, depends on the answer).
Each table has a PRIMARY KEY on id. All association columns (competition_id, club_id, and so on) have a regular INDEX. The database engine is InnoDB.
Of the two, I would recommend the latter: many niche queries. It gives the caller flexibility to pull back just what they want, and is less likely to silently introduce performance problems (e.g. only one option to retrieve data, so everyone uses it no matter how small a subset of that data they’re actually interested in).
That said, it certainly isn’t immune from performance problems, it just means the caller may be more aware of them by virtue of issuing so many API calls.
You could provide both though. Make it clear from your naming convention that the expensive version pulls back all data and is for use when the user might otherwise need to make, say, 20 – 30 calls to get the full picture.
Examples:
1 – imagine having to get that full user object just to find out the name. Really wasteful. And if done inadvertently in a big loop, a performance trap waiting to happen. Prefer a
getUserName(id)method that just reads that one value back.2 – on the other hand, if you want to display the user’s full profile in a page, then a full
getFullUserProfile(id)is most efficient (1 call rather than 10 – 20).Edit – a further useful example. Anticipate where many values are sought, e.g. rather than force the caller to run
getUserName(id)500 times to get all names for a certain condition (all admin users perhaps?), provide aList<String> getAdminUserNames()which provides all that data in one call.