I was learning apache solr scoring methods here . Here is said that you

Question

0

Asked: June 18, 20262026-06-18T16:34:01+00:00 2026-06-18T16:34:01+00:00

I was learning apache solr scoring methods here . Here is said that you

0

I was learning apache solr scoring methods here. Here is said that you should go to this page to understand the scoring formula. As I am not from maths background it is really hard for me to understand high level math. Is there any alternative to understand the basic scoring formula in easy manner?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-18T16:34:02+00:00

Lucene uses number of features to score documents, but basically scoring relies on similarity between document and your query. I explained idea of calculating similarity between documents earlier in more or less simple words, so let me explain it here only briefly.

If you have dictionary of all words, you may organize them into long-long list. Mathematicians are used to use term “vector” for any sequences, including lists of words, so let’s call it vector of words:

[abbat, about, bananas, …]

We can express each document in our collection also as vector, where each element stands for number of occurrences of corresponding word in this document. For example, if document has 1 occurrence of word “bananas”, 2 occurrences of “about” and no occurrences of “abbat”, then document vector will start as follows:

[0, 2, 1, …]

Now the most interesting part comes. We can assume that if 2 documents have a lot of common words, they are about similar topics, and if they have very few in common, then these documents are very different. Since we already know that documents may be represented as vectors of words, we can calculate similarity of documents as similarity of their vectors.

There are many ways to calculate how similar are 2 vectors. Lucene uses quite simple – cosine distance. The idea comes from geometrical representation of vectors and angle between them – if you draw 2 vectors in 2D space, you will see that the more similar are coordinates of these vectors, the less is the angle between them. This is where cosine distance comes from, but in fact you should only care about number of same words in 2 documents.

When tasking about search engines, queries are treated just like documents: document vector is built for them and is then used to find the most similar (i.e. relevant) documents from collection.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I was learning apache solr scoring methods here . Here is said that you

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply