Is there a way to iterate over a Solrj response such that the results are fetched incrementally during iteration, rather than returning a giant in-memory ArrayList?
Or do we have to resort to this:
SolrQuery query = new SolrQuery();
query.setQuery("*:*");
int fetchSize = 1000;
query.setRows(fetchSize);
QueryResponse rsp = server.query(query);
long offset = 0;
long totalResults = rsp.getResults().getNumFound();
while (offset < totalResults)
{
query.setStart((int) offset); // requires an int? wtf?
query.setRows(fetchSize);
for (SolrDocument doc : server.query(query).getResults())
{
log.info((String) doc.getFieldValue("title"));
}
offset += fetchSize;
}
And while I’m on the topic, why does SolrQuery.setStart() require an integer, when SolrDocumentList.getStart()/getNumFound() return long?
That code looks correct. You could also wrap it in an Iterator so that your client code doesn’t have to know anything about the underlying paging.
About
SolrQuery.setStart()requiring an Integer, it certainly looks odd, I think you’re right and it should be a long as well. Try asking on the solr-user or lucene-dev mailing lists.