I have a MySQL database with 2 million records. I’m already using sphinx to index the data and then search it fast.
I have two indexes. One big index which is rotated each day at 3AM, and one smaller which holds incremental changes only. Its rotated each 30 mins and indexes only the new rows in the database (those that are inserted after 3AM).
Everything is good. Search is working. But I’m looking for some improvements. I don’t need to reindex the big database each day, because once the information is inserted in the database, it doesn’t change (i have only inserts, no updates). So rebuilding the large index is absolutely useless.
Is it possible to split this index on yearly or even monthly indexes? Is this going to speed or will slow down the search queries? Any examples how to organize the index and data sources? Would it be better if I switch to real time indexes?
You could just use the Merge feature
http://sphinxsearch.com/docs/current.html#index-merging
Once a day merge your ‘delta’ back into the ‘main’. Right after merging want to update the counter table, as the data in the main has changed, so the value has changed.
(In general the more indexes you have the more searching is going to be affected. Eventually searching lots of small indexes will take more work than it saves. – The exception is if you can just search parts of the data. For example if you have queries taht just search records in the last year, can tweak it so only search the latest index. That is more efficient than searching all the records only to discard many.)