Jeff Atwood wrote once he found quering database for primary keys, and then getting all relevant fields with an IN clause is double as quick as its single-sql counterpart.
I wonder if this applies to all situations, and if not, what are the cases when it still provides significant room for improvement in terms of performance?
Furthermore, how expensive is it to access db via a scripting language library? I’m mostly talking about the very famous PHP-MySQL combination.
Jeff Atwood is talking about SQL Server, not MySQL. SQL optimisations are notoriously dependent on the DBMS, the configuration, the query, the data, and the state of the cache. Other than saying that selecting just the primary key fields will at least as fast as selecting the entire row, it’s hard to generalise. Certainly it’s hard to generalise to any degree that would be useful. You’ll have to benchmark your particular case.
Based on my experience with MySQL, I’d be surprised if selecting the details with an IN query were faster than doing a
SELECT *in the first place. My understanding is thatSELECT *is more expensive thanSELECT idbecause MySQL has to look up the index data in both cases, but in the former case has to do the additional step of fetching the data that constitutes the rest of the row, which may require further disk seeks (especially since the table data is less likely to be in the cache than the index). However, with an InnoDB clustered index (as the primary key will be if you’re using InnoDB) there is a special case that the data is stored alongside the index entry in the clustered index. In this case, I believe theSELECT *will be almost the same speed asSELECT id.