In situations like this which method or mix of methods performs the quickest?
$year = db_get_fields("select distinct year from car_cache order by year desc");
Or
$year = db_get_fields("select year from car_cache");
$year = array_unique($year);
sort($year);
I’ve heard the distinct on mysql is a real big performance hit for large queries and this table can have a million rows or more. I wondered what combination of database types, Innodb or MyISAM, would work best too. I know many optimizations are very query dependent. Year is an unsigned number, but other fields are varchar of different lengths I know that may make a difference too. Such as:
$line = db_get_fields("select distinct line from car_cache where year='$postyear' and make='$postmake' order by line desc");
I read that using the new innodb multiple keys method can make queries like this one very very quick. But the distinct and order by clauses are red flags to me.
Have MySQL do as much work as possible. If it isn’t being efficient at what its doing, then things likely aren’t set up correctly (whether it is proper indexing for the query you are trying to run, or settings with sort buffers).
If you have an index on the
yearcolumn, then usingDISTINCTshould be efficient. If you do not, then a full table scan is necessary in order to fetch the distinct rows. If you try to sort out the distinct rows in PHP rather than MySQL, then you transmit (potentially) much more data from MySQL to PHP, and PHP consumes much more memory to store all that data before eliminating the duplicates.Here is some sample output from a dev database I have. Also note that this database is on a different server on the network from where the queries are being executed.
If I attempt the same query, except replace the
SerialNumbercolumn with one that is non-indexed, then it takes forever to run because MySQL has to examine all 97 million rows.Some of the efficiency has to do with how much data you expect to get back. If I slightly modify the above queries to operate on the
timecolumn (the timestamp of the reading), then it takes 1 min 40 seconds to get a distinct list of 273,505 times, most of the overhead there is in transferring all the records over the network. So keep in mind the limits on how much data you are getting back, you want to keep that as low as possible for the data you are trying to fetch.As for your final query:
There should be no problem with that either, just make sure you have a compound index on
yearandmakeand possibly an index online.On a final note, the engine I am using for the readings table is InnoDB, and my server is:
5.5.23-55-log Percona Server (GPL), Release 25.3which is a version of MySQL by Percona Inc.Hope that helps.