I’m having a bit of an issue with a query and scaling that query for perfomance with users with high amount of friends. The goal of query is grab the top “activities’ performed by your friends in the last 30 days. Here is my query:
SELECT a.activity_id, b.activity_name, count(a.activity_id) as total_count
FROM friends as f
INNER JOIN activities as a on (a.user_id = f.friend_id
and a.created_at >= DATE_SUB(NOW(), INTERVAL 30 DAY)
INNER JOIN activity as b on a.activity_id = b.activity_id
WHERE f.user_id = 1 and f.is_approved = 1
GROUP by a.activity_id
ORDER by total_count DESC
LIMIT 5
This query takes like 25 seconds to run for all users no matter how big or small their friends graph is. Indexes are below:
Table: activities
PRIMARY: [act_id] Other: [activity_id, user_id], [user_id, created_at], [created_at]
Table: friends
PRIMARY: [user_id, friend_id] Other: [user_id, is_approved], [friend_id]
Table: activity:
PRIMARY: [activity_id]
Any help would be greatly appreciated.
UPDATE: Here is the explain
id select_type table key key_len ref rows Extra
1 SIMPLE F ref friend_lookup 5 const,const 795 Using temporary; Using filesort
1 SIMPLE A ref user_id 4 F.friend_id 58 Using where
1 SIMPLE B eq_ref PRIMARY 4 P.activty_id 1 Using where
Robin is correct on the date field. If you are using a function, it will have to compute that for however many entries its scanning against. The way I have it below uses MySQL variables. I calculate it ONCE into an @StartDate and use THAT value for the join clause.
The only additional thing I changed was adding the “STRAIGHT_JOIN” clause. On many instances, I’ve found that it has helped myself and others to optimize the query. It prevents MySQL from trying to interpret the query in another way by possibly looking at the Activity table first since its a smaller file and then back-linking from that one. “STRAIGHT_JOIN” tells the optimizer to do it in the order you’ve listed.
Per feedback
That being the case, and having this “rolling 30 days ago” cycle, I would then resort to a nightly table creation that is nothing but a creation by user ID, activity and count and query from that instead…
Ensure you have an index on this daily aggregate table by the ( user ID and total count ) then query directly to this based on the friend ID ordered by total_count descending and limit 5. Small price to pay to have a nightly trigger / event / script to be run to create this ONCE. How critical is it to see activity for the current date too. Is the activity that drastic that one day activity would skew what you otherwise want to present to the user?