Hey all, I’ve got a query in need of optimizing. It works but its a dog, performance wise.
It reads like this:
SELECT *
FROM (
SELECT *
FROM views
WHERE user_id = '1'
ORDER BY
page DESC
) v
GROUP BY
v.session
I’m tracking views to different pages, and I want to know the highest page per session, in order to know how far they’ve clicked through (they’re required to view every page all the way to the end) in any given session.
Basically what I’m trying to do is ORDER the results before the GROUP. Which the above achieves, at significant cost.
Anyone who can slap me over the head with how to do this? Thanks guys!
Update:
The Explain:
"1" "PRIMARY" "<derived2>" "ALL" \N \N \N \N "3545" "Using temporary; Using filesort"
"2" "DERIVED" "views" "index" \N "page" "5" \N "196168" "Using where"
The schema:
ID int(8) unsigned (NULL) NO PRI (NULL) auto_increment select,insert,update,references
page int(8) (NULL) YES MUL (NULL) select,insert,update,references
user_id int(8) (NULL) YES (NULL) select,insert,update,references
session int(8) (NULL) YES (NULL) select,insert,update,references
created datetime (NULL) NO select,insert,update,references
Index Info:
views 0 PRIMARY 1 ID A 196008 (NULL) (NULL) BTREE
views 1 page 1 page A 259 (NULL) (NULL) YES BTREE
Ordering before grouping is a highly unreliable way to do this.
MySQLextendsGROUP BYsyntax: you can use ungrouped and unaggregated fields inSELECTandORDER BYclauses.In this case, a random value of
pageis output per eachsession.Documentation explicitly states that you should never make any assumptions on which value exactly will it be:
However, in practice, the values from the first row scanned are returned.
Since you are using an
ORDER BY page DESCin your subquery, this row happens to be the rows with a maximalpageper session.You shouldn’t rely on it, since this behaviour is undocumented and if some other row will be returned in next version, it will not be considered a bug.
But you don’t even have to do such nasty tricks.
Just use aggregate functions:
This is documented and clean way to do what you want.
Create a composite index on
(user_id, session, page)for the query to run faster.If you need all columns from your table, not only the aggregated ones, use this syntax:
This assumes that
idis aPRIMARY KEYonviews.