I’m indexing documents in Elastic Search that contain arrays.
Sample documents :
doc1:
{
...
actors: ["Tom Cruise", "Brad Pitt", ...],
...
}
doc2:
{
...
actors: ["Brad Pitt", "Tom Cruise", ...],
...
}
When searching in such documents, I would like to have a score dependent of the matching position in the array, meaning that in the sample documents, searching “Tom Cruise” should boost the first document doc1 because its matching position is 1.
The only solution I can think of right now is by adding a limited number of fields (something like 5) containing the first actors, and putting boosts, like :
doc1:
{
...
actors: ["Tom Cruise", "Brad Pitt", ...],
actor1: "Tom Cruise",
actor2: "Brad Pitt",
...
}
with actor1 having a boost of 5, actor2 4, and so on.
Do you have a better solution to handle that, maybe using custom_score ?
Thanks !
Given this
Then this query
calculates a score of 4 for the first film and 2 for the last one (if you’re pasting this into curl, I had to remove all the line breaks in the custom script)
Some caveats:
length - offsetas the score so you can only really compare things of the same lengthdoc.actors(i.e. the indexed data) only has an alphabetically sorted version of the array, which is obviously not useful, so I had to use_sourcewhich I believe is a lot slower. It might be acceptable performance-wise if the custom_score query wraps a filtered query.