I’m curious if there’s a way to query ElasticSearch so that it will return the top results for various fascets. For example, let’s pretend we have some users writing tweets,
user: kimchy
user_eye_color: blue
tweet: elasticsearch training early bird discounts
# Lots of other message from blue eye color users mentioning 'bird'
user: lord_oliver
user_eye_color: amber-green
tweet: vanquished and consumed the twitter bird. today is a good day.
If there are enough blue-eyed users (or other colors more common than amber-green) writing tweets mentioning “bird”, searching for “bird” will never surface Lord Oliver’s tweet, even if Lord Oliver’s tweet has a reasonably high score.
This is a problem because [in this hypothetical example], I want to surface results from a diversity of users. One current solution would be to add facets on eye color,
facets:
eye_color:
terms: {"field": "user_eye_color"}
and then perform multiple filtered searches afterward. This seems rather inefficient, however.
Question: Is there any way in ElasticSearch to return multiple result sets, either by returning top results from different facets (in this case, user_eye_color=amber-green), writing a stateful custom scoring function, or any other creative solution?
The justification for why I want to do this is that it’s sometimes difficult to put a total order (floating point score) on all search results. Suppose that all amber-green eye color users happen to be cats, and they write different types of documents (tweets). Instead of trying to force all cat-written documents into a total order with all documents, I want pareto-optimal documents — those optimal within the X-eye-color categories. I could then do more sensible postfiltering, for example, dropping cat-written documents if there’s nothing good, and otherwise doing some kind of sensible interleaving of results. Dropping in some kind of score multiplier [based on eye color] would likely be less effective.
If you don’t like my toy example (or its underhand satire), consider cases where you have an index with different document types, say tweets and FBI reports …
It can be now done using top hits aggregation.