I am trying to write a simple program using Lucene 2.9.4 which searches for a phrase query but I am getting 0 hits
public class HelloLucene {
public static void main(String[] args) throws IOException, ParseException{
// TODO Auto-generated method stub
StandardAnalyzer analyzer = new StandardAnalyzer(Version.LUCENE_29);
Directory index = new RAMDirectory();
IndexWriter w = new IndexWriter(index,analyzer,true,IndexWriter.MaxFieldLength.UNLIMITED);
addDoc(w, "Lucene in Action");
addDoc(w, "Lucene for Dummies");
addDoc(w, "Managing Gigabytes");
addDoc(w, "The Art of Computer Science");
w.close();
PhraseQuery pq = new PhraseQuery();
pq.add(new Term("content", "lucene"),0);
pq.add(new Term("content", "in"),1);
pq.setSlop(0);
int hitsPerPage = 10;
IndexSearcher searcher = new IndexSearcher(index,true);
TopScoreDocCollector collector = TopScoreDocCollector.create(hitsPerPage, true);
searcher.search(pq, collector);
ScoreDoc[] hits = collector.topDocs().scoreDocs;
System.out.println("Found " + hits.length + " hits.");
for(int i=0; i<hits.length; i++){
int docId = hits[i].doc;
Document d = searcher.doc(docId);
System.out.println((i+1)+ "." + d.get("content"));
}
searcher.close();
}
public static void addDoc(IndexWriter w, String value)throws IOException{
Document doc = new Document();
doc.add(new Field("content", value, Field.Store.YES, Field.Index.NOT_ANALYZED));
w.addDocument(doc);
}
}
Please tell me what is wrong. I have also tried using QueryParser as following
String querystr ="\"Lucene in Action\"";
Query q = new QueryParser(Version.LUCENE_29, "content",analyzer).parse(querystr);
But this is also not working.
There are two issues with the code (and they have nothing to do with your version of Lucene):
1) the StandardAnalyzer does not index stopwords (like “in”), so the PhraseQuery will never be able to find the phrase “Lucene in”
2) as mentioned by Xodarap and Shashikant Kore, your call to create a document needs to include Index.ANALYZED, otherwise Lucene does not use the Analyzer on this section of the Document. There’s probably a nifty way to do it with Index.NOT_ANALYZED, but I’m not familiar with it.
For an easy fix, change your addDoc method to:
and modify your creation of the PhraseQuery to:
This will give you the result below since both “computer” and “science” are not stopwords:
If you want to find “Lucene in Action”, you can increase the slop of this PhraseQuery (increasing the ‘gap’ between the two words):
If you really want to search for the sentence “lucene in”, you will need to select a different analyzer (like the SimpleAnalyzer). In Lucene 2.9, just replace your call to the StandardAnalyzer with:
Or, if you’re using version 3.1 or higher, you need to add the version information:
Here is a helpful post on a similar issue (this will help you get going with PhraseQuery):
Exact Phrase search using Lucene? — see WhiteFang34’s answer.