I was looking for good code for searching index using lucene.net. i got one look promising but i got some confusion. if possible anyone who is familiar with lucene.net then please have look at the code and tell me why the person construct that code in that way.
from where i got the code…url as follows
http://www.codeproject.com/Articles/320219/Lucene-Net-ultra-fast-search-for-MVC-or-WebForms
here is code
private static IEnumerable<SampleData> _search
(string searchQuery, string searchField = "") {
// validation
if (string.IsNullOrEmpty(searchQuery.Replace("*", "").Replace("?", ""))) return new List<SampleData>();
// set up lucene searcher
using (var searcher = new IndexSearcher(_directory, false)) {
var hits_limit = 1000;
var analyzer = new StandardAnalyzer(Version.LUCENE_29);
// search by single field
if (!string.IsNullOrEmpty(searchField)) {
var parser = new QueryParser(Version.LUCENE_29, searchField, analyzer);
var query = parseQuery(searchQuery, parser);
var hits = searcher.Search(query, hits_limit).ScoreDocs;
var results = _mapLuceneSearchResultsToDataList(hits, searcher);
analyzer.Close();
searcher.Close();
searcher.Dispose();
return results;
}
// search by multiple fields (ordered by RELEVANCE)
else {
var parser = new MultiFieldQueryParser
(Version.LUCENE_29, new[] { "Id", "Name", "Description" }, analyzer);
var query = parseQuery(searchQuery, parser);
var hits = searcher.Search
(query, null, hits_limit, Sort.RELEVANCE).ScoreDocs;
var results = _mapLuceneSearchResultsToDataList(hits, searcher);
analyzer.Close();
searcher.Close();
searcher.Dispose();
return results;
}
}
}
i have couple of question here for the above routine
1) why the developer of this code replace all * & ? to empty string in search term
2) why search once with QueryParser and again by MultiFieldQueryParser
3) how developer detect that search term has one word or many words separated by space.
4) how wild card search can be done using this code....where to change in code for handling wild card.
5) how to handle search for similar word like if anyone search with helo then hello related result should come.
var hits = searcher.Search(query, 1000).ScoreDocs;
6) when my search result will return 5000 record and then if i limit like 1000 then how could i show next 4000 in pagination fashion.what is the object for giving the limit...i think for fastness but if i specify limit the how can i show other results....what would be the logic
i will be glad if someone discuss about all my points. thanks
Because those are special characters for wildcard search. What the author does – he checks if a search query has something else along with wildcards. You don’t usually want to search for “*”, for example.
He doesn’t search with QueryParsers per se, but he’s parsing a search query (string) and making a bunch of objects out of it. Those objects are then consumed by a
Searcherobject, which performs actual search.That’s something a Parser object should care about, not the developer.
The wildcards are specified in a
searchQueryparameter. Specifying “test*” will count as a wildcard, for example. Details are here.I think you want a fuzzy search.
Here’s an article about that.
UPD: About multiple fields. Logic is following:
searchFieldis specified, than use simple parser, that will produce query likesearchField: value1 seachField: value2... etc.searchQuerywill specify fields and values like"field1: value1 field2: value2". Example is on the same syntax page, as I previously mentioned.UPD2: Don’t hesitate to look for Java documentation and examples for Lucene, as this is initially a Java project (hence, there’s a lot of Java examples and tutorials). Lucene.NET is a ported project and both projects share a lot of functionality and classes.
UPD3: About fuzzy search, you might also want to implement your own analyzer for synonyms search (we used that technique in one of commercial projects, which I worked on, to handle common typos along with synonyms).