For a Google sitemap XML, I need all document id’s collected by Sphinx. But with 1000+ documents, if I try to get them all in a simple loop, it ultimately gives me Error: searchd error: offset out of bounds (offset=1000, max_matches=1000).
I could increase the max_matches setting, but that would kill performance.
And I don’t want to simply run a MySQL query, because there’s a UNION and a bunch of checks/rules in the Sphinx indexer query. And I want my query on one place for maintainability.
So what I’ve done now is, for each category (I need those too for the sitemap), I run a Sphinx query filtered on category. That way I stay below the 1000 documents limit.
There must be a better solution for this. Right?
I’ve posted PHP code for this here:
http://sphinxsearch.com/forum/view.html?id=7215
basically you just retreive the results 1000 documents at a time in a while loop. sitemaps dont care about the order of results in the file, so it doesn’t matter tha you need to get the results in document_id order.