I have two Lucene indexes that I’m trying to open together with a ParallelCompositeReader in Lucene 4.X. Both indexes contain the same number of documents (14365790) in the same order. My code looks like:
val articlesReader = DirectoryReader.open(FSDirectory.open(...))
val citationCountReader = DirectoryReader.open(FSDirectory.open(...))
val reader = new ParallelCompositeReader(articlesReader, citationCountReader)
When I run this code, I get the following error:
Exception in thread "main" java.lang.IllegalArgumentException: All readers must have same number of subReaders
at org.apache.lucene.index.ParallelCompositeReader.validate(ParallelCompositeReader.java:147)
at org.apache.lucene.index.ParallelCompositeReader.prepareSubReaders(ParallelCompositeReader.java:100)
at org.apache.lucene.index.ParallelCompositeReader.<init>(ParallelCompositeReader.java:71)
at org.apache.lucene.index.ParallelCompositeReader.<init>(ParallelCompositeReader.java:64)
at org.apache.lucene.index.ParallelCompositeReader.<init>(ParallelCompositeReader.java:58)
Some information about the indexes:
-
The articlesReader index contains information such as the title, abstract and year of publication for each article. It was created by someone else a couple years ago, using Lucene 3.X. It’s very big and time consuming to re-create, so I’d prefer not to modify this if at all possible
-
The citationCountReader index contains the citation count for each article. It was created by iterating over the articlesReader. It’s a Lucene 4.X index. This one only takes a few hours to re-create, so if I have to re-create anything, I’d prefer to modify this one. (Though of course, I’d prefer not to have to re-create either.)
I dug a bit into the source of ParallelCompositeReader, and it seems that this error is thrown because .getSequentialSubReaders() returns a list of size 1 for articlesReader, but a list of size 3 for citationCountReader. But I don’t know what SequentialSubReaders are or how to make them the same across the two indexes. And it’s quite possible that this isn’t the key issue and/or there’s a better solution to my problem.
Ok, it turns out the issue was that the two indexes had different numbers of segments. So I had to force the merge of the index with multiple segments into a single segment. Here’s what I did:
Once both indexes had a single segment, then
.getSequentialSubReaders()returned a list of size 1 for both readers, and the ParallelCompositeReader was able to load them.