I’m new to Lucene.NET but I’m using an open source tool built for Sitecore CMS that uses Lucene.NET to index lots of content from the CMS. I confirmed yesterday that when I rebuild my indexes, the current index files wipe clean so anything that relies on the index gets no data for about 30-60 seconds (the amount of time for a full index rebuild). Is there a best practice or way to make Lucene.NET not overwrite the current index files until the new index is completely rebuilt? I’m basically thinking I’d like it to write to new temp index files and when the rebuild is done have those files overwrite the current index.
Example of what I’m talking about:
- Build fresh index (~30 seconds)
- Index has about 500 documents
- Use code to access data in index and display on website
- Rebuild index (~30 seconds)
- Any code that now reads the index for data returns nothing because the index files are being overwritten; results in website not showing any data
- Rebuild complete: data now available again, data back on website
Thanks in advance
I have no experience with “Sitecore” itself but here’s my story.
We’ve recently incorporated the index-based search (using Lucene.Net) for our eCommerce sub-system. The index update process for our case might take about half a hour (~50,000 products themselves + lots of related information). To prevent a “denial of service” responses during the update of the index we first create a “backup” version of the it (simply copying index directory to another location) and all further requests are redirected to use this “backup” version. When the index update is completed we delete the backup in order for clients to start using the updated (or “live”) version of the index. This is also helps in case of any unhandled exceptions that might occur during the update process becase you might end up in a situation of having no index at all (and in our case clients can always use the “backup” version).
The API reference (Lucene 2.4) of the
Lucene.Net.Index.IndexWriterobject states the following:So at least you shouldn’t worry about the clients that are currently searching within your index.
Hope this will help you to make a right decision.