I’m posting documents to the SOLR server, roughly 5000 documents at a time per commit. At the end of the multiple commits, I look at the SOLR admin panel, instead of their being 280,000 documents, the SOLR admin panel reports only having 5000 documents.
It looks like every time I call a commit the documents are getting overwritten. However the indexes are growing in size.
Here is the API that I’m referring to:
http://lucene.apache.org/solr/api/org/apache/solr/client/solrj/SolrServer.html#add%28java.util.Collection%29
Here is the code:
private final SolrServer server;
this.server = new CommonsHttpSolrServer(getPropertyManager().getSolrMasterUrl());
final Collection<UpdateResponse> responses = new ArrayList<UpdateResponse>(4);
responses.add( server.add(solrDocuments) );
responses.add( server.optimize() );
responses.add( server.commit() );
I see the indexes in SOLR increase in KB every time there is a commit of another 5000 documents, the indexes grow. However, the SOLR admin panel reports only having 5,000 documents, so it does not make any sense.
numDocs: 5164
maxDoc: 5164
version: 1332445599423
segmentCount: 1
current: true
hasDeletions: false
directory: org.apache.lucene.store.SimpleFSDirectory:org.apache.lucene.store.SimpleFSDirectory@ Z:\jboss-soa-p-5\jboss-as\server\experimental\solr\data\index lockFactory=org.apache.lucene.store.NativeFSLockFactory@8d921a
lastModified: 2012-03-23T13:38:53.539Z
Check to ensure that the 5000 documents you are sending each time are unique. If you are sending the same set of documents each time, Solr is smart enough to just replace the old documents with the new ones (if the other fields are different), otherwise it will just ignore the request to add the document because it already has matching copy.
This is being done based on the
<uniqueKey>setting in your schema.xml file. So, if your documents have an id field that is specified as the uniqueKey and you number them 1 – 5000 and you keep sending the same set of documents into Solr, it will just keep ignoring the requests to add the sets.