Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 379591
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 12, 20262026-05-12T14:54:23+00:00 2026-05-12T14:54:23+00:00

I am trying to get a particular search to work and it is proving

  • 0

I am trying to get a particular search to work and it is proving problematic. The actual source data is quite complex but can be summarised by the following example:

I have articles that are indexed so
that they can be searched. Each
article also has multiple properties
associated with it which are also
indexed and searchable. When users
search, they can get hits in either
the main article or the associated
properties. Regardless of where a hit
is achieved, the article is returned
as a search hit (ie. the properties
are never a hit in their own right).

Now for the complexity:

Each property has security on it,
which means that for any given user,
they may or may not be able to see the
property. If a user cannot see a
property, they obviously do not get a
search hit in it. This security check
is proprietary and cannot be done
using the typical mechanism of storing
a role in the index alongside the
other fields in the document.

I currently have an index that contains the articles and properties indexed separately (ie. an article is indexed as a document, and each property has its own document). When a search happens, a hit in article A or a hit in any of the properties of article A should be classed as hit for article A alone, with the scores combined.

To achieve this originally, Lucene v1.3 was modified to allow this to happen by changing BooleanQuery to have a custom Scorer that could apply the logic of the security check and the combination of two hits in different documents being classed as a hit in a single document. I am trying to upgrade this version to the latest (v2.3.2 – I am using Lucene.Net), but ideally without having to modify Lucene in any way.

An additional problem occurs if I do an AND search. If an article contains the word foo and one of its properties contains the word bar, then searching for “foo AND bar” will return the article as a hit. My current code deals with this inside the custom Scorer.

Any ideas how/if this can be done?

I am thinking along the lines of using a custom HitCollector and passing that into the search, but when doing the boolean search “foo AND bar”, execution never reaches my HitCollector as the ConjunctionScorer filters out all of the results from the sub-queries before getting there.


EDIT:

Whether or not a user can see a property is not based on the property itself, but on the value of the property. I cannot therefore put the extra security conditions into the query upfront as I don’t know the value to filter by.

As an example:

+---------+------------+------------+
| Article | Property 1 | Property 2 |
+---------+------------+------------+
|    A    |     X      |     J      |
|    B    |     Y      |     K      |
|    C    |     Z      |     L      |
+---------+------------+------------+

If a user can see everything, then searching for “B and Y” will return a single search result for article B.

If another user cannot see a property if its value contains Y, then searching for “B and Y” will return no hits.

I have no way of knowing what values a user can and cannot see upfront. They only way to tell is to perform the security check (currently done at the time of filtering a hit from a field in the document), which I obviously cannot do for every possible data value for each user.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-12T14:54:24+00:00Added an answer on May 12, 2026 at 2:54 pm

    Having now implemented this (after a lot of head-scratching and stepping through Lucene searches), I thought I’d post back on how I achieved it.

    Because I am interested in all of the results (ie. not a page at a time), I can avoid using the Hits object (which has been deprecated in later versions of Lucene anyway). This means I can do my own hit collection using the Search(Weight, Filter, HitCollector) method of IndexSearcher, iterating over all possible results and combining document hits as appropriate. To do this, I had to hook into Lucene’s querying mechanism, but only when AND and NOT clauses are present. This is achieved by:

    1. Creating a custom QueryParser and overriding GetBooleanQuery(ArrayList, bool) to return my own implementation.
    2. Creating a custom BooleanQuery (returned from the custom QueryParser) and overriding CreateWeight(Searcher) to return my own implementation.
    3. Creating a custom Weight (returned from the custom BooleanQuery) and overriding Scorer(IndexReader) to return my own implementation.
    4. Creating a custom BooleanScorer2 (returned from the custom Weight) and overriding the Score(HitCollector) method. This is what deals with the custom logic.

    This might seem like a lot of classes, but most of them derive from a Lucene class and just override a single method.

    The implementation of the Score(HitCollector) method in the custom BooleanScorer2 class now has the responsibility of doing the custom logic. If there are no required sub-scorers, the scoring can be passed to the base Score method and run as normal. If there are required sub-scorers, it means there was a NOT or an AND clause in the query. In this case, the special combination logic mentioned in the question comes into play. I have a class called ConjunctionScorer that does this (this is not related to the ConjunctionScorer in Lucene).

    The ConjunctionScorer takes a list of scorers and iterates over them. For each one, I extract the hits and their scores (using the Doc() and Score() methods) and create my own search hits collection containing only those hits that the current user can see after performing the relevant security checks. If a hit has already been found by another scorer, I combine them together (using the mean of their scores for their new score). If a hit is from a prohibited scorer, I remove the hit if it was already found.

    At the end of all of this, I set the hits onto the HitCollector passed into the BooleanScorer2.Score(HitCollector) method. This is a custom HitCollector that I passed into the IndexSearcher.Search(Query, HitCollector) method to originally perform the search. When this method returns, my custom HitCollector now contains my search results combined together as I wanted.

    Hopefully this information will be useful to someone else faced with the same problem. It sounds like a lot of effort, but it is actually pretty trivial. Most of the work is done in combining the hits together in the ConjunctionScorer. Note that this is for Lucene v2.3.2, and may be different in later versions.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I am trying to get all values for a particular search regardless of casing.
I am trying to get access to all the endpoints that a particular contact
I'm trying to get meta tag information for each page on a particular site,
I am trying to get started in testing ActiveAdmin, in particular I need to
I'm messing with core.time.Duration s - in particular, I'm trying to properly get number
I'm trying to duplicate a particular entry in the form. I get all the
I am trying get Struts 2 and Tiles to work and I am using
I am trying to get out a specific row (at a particular minute in
I am trying to get the most occurring term frequencies for every particular document
I am trying to get all the data from http://www.nationwide.com/locator/home/index.x?lineOfBusiness=insurance_agent&locatorhome=fromhome&language= every state listed there.

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.