I have a query that’s used in a bowling center to sort all the best averages in descending order.
Everything works well, except that if a player plays in two different leagues (or when I don’t group by season since the averages reset after each season), I only want the best average of the given player since I don’t want duplicates. (Averages in different leagues for the same player don’t accumulate, so a player can have more than one average)
I thought I had solved that problem a while ago after asking on Stack Overflow (here), but recently, I was told that sometimes, there are some problems with the query, which I don’t know how I didn’t notice earlier.
The problem is that even though I don’t get duplicate names and I get the correct MAX average, the other columns, such as the leagueName, the number of games played and the season aren’t always correct if a player plays in multiple league. Here’s the query:
SELECT PlayerID, Name, max(score)Avg, gamesCount, LeagueName, Season
FROM( SELECT PlayerID, Player.Name as Name, Player.Gender as Gender, ROUND(AVG(score),2) as score, COUNT(score) as gamesCount, LeagueName, Season
FROM Scores JOIN Players as Player USING(PlayerID)
WHERE Score > -1 AND bowlout = 'No' AND season = '2011-2012'
GROUP BY PlayerID, LeagueName, Season
HAVING gamesCount >= 50
) as league_avg
WHERE Gender = 'Male'
GROUP BY PlayerID
ORDER BY Avg DESC LIMIT 0,50;
Obviously, it doesn’t work because the external query only groups by PlayerID, so it gets the max AVG of the player, but the other fields, such as leagueName if there are multiple leagues a player bowls in, are chosen at random from the leagues he plays in.
What I’d like is to get the leagueName (and all the other info) corresponding to the player and his max average.
Here’s an example:
Name | AVG | LeagueName
Jones, Tom 122.56 Friday League
Smith, Adam 182.42 Super League
Smith, Adam 194.25 Friendly League
...
The expected result would be:
Name | AVG | LeagueName
Smith, Adam 194.25 Friendly League
Jones, Tom 122.56 Friday League
What I’m getting:
Name | AVG | LeagueName
Smith, Adam 194.25 *Super League*
Jones, Tom 122.56 Friday League
As you can see, Smith, Adam has the correct AVG, but the wrong league associated with the Name/ Avg combo.
I tried changing the external GROUP BY clause to PlayerID, LeagueName, Season, but that just reseparates per league per season and then I get the duplicates back again. I don’t know what to try anymore other than just using the Java application this is in, grabbing all the results and removing the duplicates in Java. Obviously, I’d rather get the correct results the first time from a SQL query.
As a side note even though it was mentioned earlier in this post, sometimes the query won’t have the "AND season = ‘2011-2012’" part, so I must not get duplicates for the same player in different seasons either.
Edit: I’m using SQLite in case some people hadn’t noticed the tags.
As was posted by Andriy M in the comments, there is a workaround to let the aggregate functions get the correct results for the columns that are not in the GROUP BY clause.
It is not safe to use that workaround for compatibility issues with future versions as it is not defined in the SQLite specifications, but it works for me in this particular case without slowing down the query, which is exactly what I wanted.
I don’t plan on upgrading my SQLite version in the future either as I already have plans to put my application online with a MySQL database instead, so I feel like posting this answer is justified since it solves my problem perfectly.
The trick is to use ORDER BY in the inner query on the field for the averages. It works because when the external query tries to GROUP BY PlayerID, the other columns that are not grouped by that get used are the ones that go with the last instance of the field that is grouped by. So if a PlayerID has three different averages, in the inner query the highest average will be last, and thus the external query will use the fields that come with the last instance of that particular PlayerID.
Here’s the code, the added line has a comment by it: