My Lucene Java implementation is eating up too many files. I followed the instructions

Question

0

Asked: May 23, 20262026-05-23T08:53:28+00:00 2026-05-23T08:53:28+00:00

My Lucene Java implementation is eating up too many files. I followed the instructions

0

My Lucene Java implementation is eating up too many files. I followed the instructions in the Lucene Wiki about too many open files, but that only helped slow the problem. Here is my code to add objects (PTicket) to the index:

//This gets called when the bean is instantiated
public void initializeIndex() {
    analyzer = new WhitespaceAnalyzer(Version.LUCENE_32);
    config = new IndexWriterConfig(Version.LUCENE_32, analyzer);

}


public void addAllToIndex(Collection<PTicket> records) {  
    IndexWriter indexWriter = null;
    config = new IndexWriterConfig(Version.LUCENE_32, analyzer);

    try{
        indexWriter = new IndexWriter(directory, config);
        for(PTicket record : records) {
            Document doc = new Document();
            StringBuffer documentText = new StringBuffer();
            doc.add(new Field("_id", record.getIdAsString(), Field.Store.YES, Field.Index.ANALYZED));
            doc.add(new Field("_type", record.getType(), Field.Store.YES, Field.Index.ANALYZED));

            for(String key : record.getProps().keySet()) {
                List<String> vals = record.getProps().get(key);

                for(String val : vals) {
                    addToDocument(doc, key, val);
                    documentText.append(val).append(" ");
                }
            }
            addToDocument(doc, DOC_TEXT, documentText.toString());        
            indexWriter.addDocument(doc);    
        }

        indexWriter.optimize();
    } catch (Exception e) {
        e.printStackTrace();
    } finally {
        cleanup(indexWriter);
    }
}

private void cleanup(IndexWriter iw) {
    if(iw == null) {
        return;
    }

    try{
        iw.close();
    } catch (IOException ioe) {
        logger.error("Error trying to close index writer");
        logger.error("{}", ioe.getClass().getName());
        logger.error("{}", ioe.getMessage());
    }
}

private void addToDocument(Document doc, String field, String value) {
    doc.add(new Field(field, value, Field.Store.YES, Field.Index.ANALYZED));
}

EDIT TO ADD code for searching

public Set<Object> searchIndex(AthenaSearch search) {  

    try {
        Query q = new QueryParser(Version.LUCENE_32, DOC_TEXT, analyzer).parse(query);

        //search is actually instantiated in initialization.  Lucene recommends this.
        //IndexSearcher searcher = new IndexSearcher(directory, true);
        TopDocs topDocs = searcher.search(q, numResults);
        ScoreDoc[] hits = topDocs.scoreDocs;
        for(int i=start;i<hits.length;++i) {
            int docId = hits[i].doc;
            Document d = searcher.doc(docId);
            ids.add(d.get("_id"));
        }
        return ids;
    } catch (Exception e) {
        e.printStackTrace();
        return null;
    }
}

This code is in a web application.

1) Is this the advised way to use IndexWriter (instantiating a new one on each add to index)?

2) I’ve read that raising ulimit will help, but that just seems like a band-aid that won’t address the actual problem.

3) Could the problem lie with IndexSearcher?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-23T08:53:29+00:00

1) Is this the advised way to use
IndexWriter (instantiating a new one
on each add to index)?

i advise No, there are constructors, which will check if exists or create a new writer, in the directory containing the index. problem 2 would be solved if you reuse the indexwriter.

EDIT:

Ok it seems in Lucene 3.2 the most but one constructors are deprecated,so the resue of Indexwriter can be achieved by using Enum IndexWriterConfig.OpenMode with value CREATE_OR_APPEND.

also, opening new writer and closing on each document add is not efficient,i suggest reuse, if you want to speed up indexing, set the setRamBufferSize default value is 16MB, so do it by trial and error method

from the docs:

Note that you can open an index with
create=true even while readers are
using the index. The old readers will
continue to search the “point in time”
snapshot they had opened, and won’t
see the newly created index until they
re-open.

also reuse the IndexSearcher,i cannot see the code for searching, but Indexsearcher is threadsafe and can be used as Readonly as well

also i suggest you to use MergeFactor on writer, this is not necessary but will help on limiting the creation of inverted index files, do it by trial and error method

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

My Lucene Java implementation is eating up too many files. I followed the instructions

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply