Can anybody please explain to me how to calculate ‘avgLengthPath’ variable in the BM25 implementation for Lucene. What I understand is that I have to calculate it during the indexing. But still it was not clear how to do so.
the example provided :
IndexSearcher searcher = new IndexSearcher("IndexPath");
//Load average length
BM25Parameters.load(avgLengthPath);
BM25BooleanQuery query = new BM25BooleanQuery("This is my Query",
"Search-Field",
new StandardAnalyzer());
TopDocs top = searcher.search(query, null, 10);
ScoreDoc[] docs = top.scoreDocs;
//Print results
for (int i = 0; i $<$ top.scoreDocs.length; i++) {
System.out.println(docs[i].doc + ":"+docs[i].score);
}
suggest that there are a method or class to load average length from.
Would appreciate any help…
Thanks
I have solved the problem and I would like to share me answer to get any corrections or comments..
The problem was how to calculate the avgLengthPath arguments. When I looked at the method that takes this argument:
load()it can be seen that it require a String which is the path to a file which contain the average length. So avgLengthPath would be something like:load()method is as follow:Now, lest see how create such file. We can see that the above method read the file line by line and send each two lines to another method called
BM25Parameters.setAverageLength(). the formate of the avgLengthPath file should be something like this:Where the first line is the filed name and the second line is the average length for this field.
Also, the third line is another filed and the forth line is the average length for that filed.
The problem bout such file is that we cannot get the documents length from Lucene in its default sitting. To overcome this, I re-indexed my collection and added the document length as one of the fields to be indexed by Lucene.
First I created a method that takes a file and return the document length as a string. I call it
getDocLength(File f):This method is called during indexing process to add the document length field as follow:
Finally I created a method that loop through all docs in the index and calculate the average document length and finally save the result into the avgLengthPath file with the correct formate. I called this method
generateAvgLengthPathFile():