I’ve used Lucene.net to implement search functionality (for both database content and uploaded documents) on several small websites with no problem. Now I’ve got a site where I’m indexing 5000+ documents (mainly PDFs) and the querying is becoming a bit slow.
I’m assuming the best way to speed it up would be to implement caching of some kind. Can anyone give my any pointers / examples on where to start? If you’ve got any other suggestions aside from caching (e.g should I be using multiple indexes?) I’d like to hear those too.
Edit:
Dumb user error responsible for the slow querying. I was creating highlights for the entire results set at once, instead of just the ‘page’ I was displaying. Oops.
I’m going to make a big assumption here and assume you’re not hanging onto your index searchers in-between calls to query the index.
If that’s true, then you should definitely share index searchers for all queries to your index. As the index becomes larger (and it doesn’t really have to get very large for this to become a factor), rebuilding the index searcher will become more and more of an overhead. To make this work correctly, you’ll need to synchronise access to the query parser class (it isn’t thread safe).
BTW, the Java docs are (I’ve found) just as applicable to the .net version.
For more info on your problem, see here: http://wiki.apache.org/lucene-java/ImproveSearchingSpeed