I am using lucene with a tomcat application. I have lots and lots of big and small documents that need indexing. The big documents are infrequently added and the small documents are frequently added. My current plan for flushing the indices is to do it in a singleton thread that runs in the tomcat application. I want to do this because the frequent small document adds should not force a flush every time they add the document. Thus, the index will always lag the actual documents being indexed.
The questions, are, if the adder function is not doing the flush, and for some reason lucene throws an IOException when the flush is called in the thread, how will the application know which documents are effectively unindexed, and what can be done about it. Trying to readd the indexed data again doesn’t seem to be the right solution because the exception will likely happen again.
Also, is it bad to run lucene in the tomcat cluster? Should I be running lucene in a separate java process and if so, how would that work?
And is it safer to use lucene against the file system or mysql? Obviously, these are newbie questions.
Andy
Lucene is really meant for client application development. If you plan to use it in a cluster of servers, then using Solr is the best approach. You can also cluster Solr instances and use a simple RESTless api to index new documents. Using Lucene in tomcat might work but it won’t be maintainable in the future. With Solr you have an admin ui and you can easily flush everything.