As the title suggests, or do I reuse the document object on every file that I read and then send it off to the index?
Currently I am doing this
// Loop for each file
document = new Document();
fileData = // Read file contents
document.Add(new Field("text", fileData, Field.Store.YES, Field.Index.ANALYZED, Field.TermVector.WITH_POSITIONS_OFFSETS));
indexWriter.AddDocument(document);
// end loop
For each file I read. Is this the correct approach?
Thanks
Unless you experience performance issues, creating a document each time is the correct approach. After all, the bulk time is used in reading the actual file, which you will have to do either way. Saving a few cycles on instantiating a new
Documentis probably not going to have a big impact.I would also be wary of reusing this object. Since it represents one file, reusing it on a different file could “leak” data between documents.