I’m debugging a problem with slow queries in a MySQL server. Queries normally complete in 100-400 millisecs but sometimes rocket to 10’s or 100’s of seconds.
The queries are generated by an application over which I have no control, and there are multiple databases (one for each customer). The slow queries seem to appear randomly, and neither RAM, disk or CPU is loaded when the slow queries are logged. When I run the queries manually, they run fine (as in millisecs), which makes me suspect locking issues in combination with other read and write queries. The queries itself are horrible (unable to use the index in either the WHERE or ORDER BY clause) but the largest tables are relatively small (up to 200.000 rows), and there are almost no JOINs. When I profile the queries, most time is spent sorting the result (in the case where the query runs fine).
I’m unable to reproduce the extreme slowness in a test environment, and my best idea right now is to stop the production MySQL server, create a copy of the databases, enable full query logging and starting the server again. This way I should be able to replay the load and reproduce the problem. But the general query log seems to only record the query, not the target database for the query. Do I have any other record / replay options for MySQL?
I finally nailed the problem. The application is doing something like this:
It’s fetching and processing the result set, 1 row at a time. If the loop takes 100 seconds to complete, then the table is locked on the server for 100 seconds.
Changing this setting on the MySQL server:
made the slow queries disappear instantly, because result sets are now pushed to a temp table so the table lock can be removed, regardless of how slowly the application consumes the result set. The setting brings in a host of other performance problems, but fortunately the server is dimensioned to handle these problems.