I’m trying to write a SQL or ActiveRecord query to answer this question: “Of people who took at least one ride, what is the average number of metro lines that they’ve departed from?”
The schema is as follows:
- Ride: A trip from one location to another.
user_idfor the User,location_idfor the Location. - Location: A location is a stop along a line. The location has a
line_idindicating what line it’s on. A location belongs to one line. - Line: A line is a series of related metro stops, its locations.
- User: The person who took the trip.
It looks like I need to do two things:
- Given the Rides joined to their locations, count the number of distinct
[rides.user_id, locations.line_id]combinations. - Divide that by the number of users who’ve taken at least one ride.
The result will be the average, and thus the answer to the question.
- Does that sound right?
- If so, what’s the best way to do that?
I’m using Rails, so if I can express this in ARel or AR syntax without having to drop into SQL, that would be great. But I’ll take what I can get.
A commenter asked for an example. Let’s imagine that the data looks something like this:
rides locations
====================== =====================
user_id location_id location_id line_id
1 1 1 1
1 1 2 1
1 1 3 2
1 1 4 3
2 1 5 4
2 2 6 5
2 3
3 3
3 4
3 5
3 6
We can see that user 1 took 4 rides, user 2 took 3 rides, and user 3 took 4 rides. These rides were to [1, 3, 4] distinct locations, but only [1, 2, 4] distinct lines. Thus, the average number of lines that a given user rode was (1 + 2 + 4) / 3, or 2.33....
SQL: