Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 7052107
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 28, 20262026-05-28T03:20:38+00:00 2026-05-28T03:20:38+00:00

I have a Lucene Index containing documents like these : _id | Name |

  • 0

I have a Lucene Index containing documents like these :

_id     |           Name            |        Alternate Names      |    Population

123       Bosc de Planavilla               (some names here in          5000
345       Planavilla                       other languages)             20000
456       Bosc de la Planassa                                           1000
567       Bosc de Plana en Blanca                                       100000

What’s the best Lucene query type I should use and how should I structure it considering I need the following :

  1. If a user queries for :
    “Italian Restaurant near Bosc de Planavilla”
    I want document with id 123 to be returned because its contains an exact match with the name of the doc.

  2. If a user queries for :
    “Italian Restaurant near Planavilla”
    I want document with id 345 because query contains an exact match AND it has the highest population.

  3. If a user queries for “Italian Restaurant near Bosc”
    I want 567 because query contains “Bosc” AND of the 3 “Bosc” it has the highest pop.

there are probably many other use cases … but you get the feeling of what i need …

What kind of query will do this form me ?
Should I generate word N grams (shingles) and create an ORed boolean query using the shingles then apply custom scoring ? or will a regular phrase query will do ? I also saw DisjunctionMaxQuery but dont know if its what im looking for …

The idea, as you might have anderstood by now, is to find the exact Location a user implied in his query. From that I can start my Geo search and add some further querying around that.

What’s the best approach ?

Thanks in advance .

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-28T03:20:39+00:00Added an answer on May 28, 2026 at 3:20 am

    Here is the code for sorting as well. Although I think it would make more sense to add a custom scoring taking into account the city size rather than bruteforcing the sort on the population. Also please note that this uses the FieldCache, which may not be the best solution regarding memory usage.

    public class ShingleFilterTests {
        private Analyzer analyzer;
        private IndexSearcher searcher;
        private IndexReader reader;
        private QueryParser qp;
        private Sort sort;
    
        public static Analyzer createAnalyzer(final int shingles) {
            return new Analyzer() {
                @Override
                public TokenStream tokenStream(String fieldName, Reader reader) {
                    TokenStream tokenizer = new WhitespaceTokenizer(reader);
                    tokenizer = new StopFilter(false, tokenizer, ImmutableSet.of("de", "la", "en"));
                    if (shingles > 0) {
                        tokenizer = new ShingleFilter(tokenizer, shingles);
                    }
                    return tokenizer;
                }
            };
        }
    
        public class PopulationComparatorSource extends FieldComparatorSource {
            @Override
            public FieldComparator newComparator(String fieldname, int numHits, int sortPos, boolean reversed) throws IOException {
                return new PopulationComparator(fieldname, numHits);
            }
    
            private class PopulationComparator extends FieldComparator {
                private final String fieldName;
                private Integer[] values;
                private int[] populations;
                private int bottom;
    
                public PopulationComparator(String fieldname, int numHits) {
                    values = new Integer[numHits];
                    this.fieldName = fieldname;
                }
    
                @Override
                public int compare(int slot1, int slot2) {
                    if (values[slot1] > values[slot2]) return -1;
                    if (values[slot1] < values[slot2]) return 1;
                    return 0;
                }
    
                @Override
                public void setBottom(int slot) {
                    bottom = values[slot];
                }
    
                @Override
                public int compareBottom(int doc) throws IOException {
                    int value = populations[doc];
                    if (bottom > value) return -1;
                    if (bottom < value) return 1;
                    return 0;
                }
    
                @Override
                public void copy(int slot, int doc) throws IOException {
                    values[slot] = populations[doc];
                }
    
                @Override
                public void setNextReader(IndexReader reader, int docBase) throws IOException {
                    /* XXX uses field cache */
                    populations = FieldCache.DEFAULT.getInts(reader, "population");
                }
    
                @Override
                public Comparable value(int slot) {
                    return values[slot];
                }
            }
        }
    
        @Before
        public void setUp() throws Exception {
            Directory dir = new RAMDirectory();
            analyzer = createAnalyzer(3);
    
            IndexWriter writer = new IndexWriter(dir, analyzer, IndexWriter.MaxFieldLength.UNLIMITED);
            ImmutableList<String> cities = ImmutableList.of("Bosc de Planavilla", "Planavilla", "Bosc de la Planassa",
                                                                   "Bosc de Plana en Blanca");
            ImmutableList<Integer> populations = ImmutableList.of(5000, 20000, 1000, 100000);
    
            for (int id = 0; id < cities.size(); id++) {
                Document doc = new Document();
                doc.add(new Field("id", String.valueOf(id), Field.Store.YES, Field.Index.NOT_ANALYZED));
                doc.add(new Field("city", cities.get(id), Field.Store.YES, Field.Index.ANALYZED));
                doc.add(new Field("population", String.valueOf(populations.get(id)),
                                         Field.Store.YES, Field.Index.NOT_ANALYZED));
                writer.addDocument(doc);
            }
            writer.close();
    
            qp = new QueryParser(Version.LUCENE_30, "city", createAnalyzer(0));
            sort = new Sort(new SortField("population", new PopulationComparatorSource()));
            searcher = new IndexSearcher(dir);
            searcher.setDefaultFieldSortScoring(true, true);
            reader = searcher.getIndexReader();
        }
    
        @After
        public void tearDown() throws Exception {
            searcher.close();
        }
    
        @Test
        public void testShingleFilter() throws Exception {
            System.out.println("shingle filter");
    
            printSearch("city:\"Bosc de Planavilla\"");
            printSearch("city:Planavilla");
            printSearch("city:Bosc");
        }
    
        private void printSearch(String query) throws ParseException, IOException {
            Query q = qp.parse(query);
            System.out.println("query " + q);
            TopDocs hits = searcher.search(q, null, 4, sort);
            System.out.println("results " + hits.totalHits);
            int i = 1;
            for (ScoreDoc dc : hits.scoreDocs) {
                Document doc = reader.document(dc.doc);
                System.out.println(i++ + ". " + dc + " \"" + doc.get("city") + "\" population: " + doc.get("population"));
            }
            System.out.println();
        }
    }
    

    This gives the following results:

    query city:"Bosc Planavilla"
    results 1
    1. doc=0 score=1.143841[5000] "Bosc de Planavilla" population: 5000
    
    query city:Planavilla
    results 2
    1. doc=1 score=1.287682[20000] "Planavilla" population: 20000
    2. doc=0 score=0.643841[5000] "Bosc de Planavilla" population: 5000
    
    query city:Bosc
    results 3
    1. doc=3 score=0.375[100000] "Bosc de Plana en Blanca" population: 100000
    2. doc=0 score=0.5[5000] "Bosc de Planavilla" population: 5000
    3. doc=2 score=0.5[1000] "Bosc de la Planassa" population: 1000
    
    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I have created a lucene index . I would like to get all documents
I have a Lucene index that contains documents that have a type field, this
I have a Lucene index that has several documents in it. Each document has
We have set up an Solr index containing 36 million documents (~1K-2K each) and
Let's say we have a Lucene index having few documents indexed using StopAnalyzer.ENGLISH_STOP_WORDS_SET .
I have a Lucene index of around 22,000 lucene documents but I have been
I have some documents stored in a Lucene index with a docId field. I
I have a simple lucene index, that contains some demo documents: Title, Keywords, H1Tag
Lucene: I would like to do a search on the index which I have
I have a Solr/Lucene index file of approximately 700 Gb. The documents that I

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.