I have Sphinx Search running on a Linux server with 38GB of RAM. The sphinx index contains 35M full text documents plus meta data indexed from a MySQL table. When I launch a fresh server, I run a script that “warms up the sphinx cache” by sending my 10,000 most common queries through it. It takes about an hour to run the warm up script the first time, but the same script completes in just a few minutes if I run it again.
My confusion arises from the fact that Sphinx doesn’t have any documented caching, other than a file based cache that I am not using. The index is loaded into memory when Sphinx starts, but individual queries take the same length of time each time they are run after the system has been “warmed up”.
There is a clear warm up period when I run my scripts. What is going on? Is Linux caching something that helps Sphinx run faster? Does the underlying MySQL system cache queries ( I believe Sphinx is basically a custom MySQL storage engine )? How are new queries that have never been run being made faster by what is going on?
I realize there is likely a very complex explanation for this, but even a little direction should help be dig deeper.
searchd itself doesnt have any caching – but as mentioned as indexed are read from, the OS may well start caching the files – so dont have to go all the way back to disk.
If you are using SphinxSE – then queries may be cached by the normal mysql query cache – so whole result sets are cached. But in addiction, the usual way to use SphinxSE, is to join the search results back with the original dataset, so you get both returned to the app in one go. So your queries are also dependent on the real mysql data tables. And they will be subject to the same OS caching – as mysql reads data it will be cached.
that suggests you are using a VM? If so the virtual disk might actully be located on a remote SAN. (or EBS on Amazon ec2)
Depending on where your VM is hosted might be able to get some special high performance disks – ideally local to the host – maybe even SSD – which may well help.
Anyway to trace the issue, more you should almost certainly enable the sphinx query log. Look into that to see if queries are slow executing there. There is also a startup upoption to searchd – where you can enable iostats. This will log more information to the quyery log about io stats as queries are run. This can give you additional insights.