I have a relatively simple Lucene index, being served by Solr. The index consists of two major fields, title and body, and a few less-important fields.
Most search engines give more relevance to results with matches in the title, over the body. I’m going to start providing an index-time boost to the title field.
My question is, what values do people typically use for their title fields? 2? 4? 10? 100?
I suggest you divide the median body length by the median title length. This roughly gives you a factor M – for M appearances of a word in the body, it will appear once in the title. Now, use something like M*3. This is, of course, a rationalized heuristic, and it is best you iterate over the values. See Grant Ingersoll’s ‘Debugging Relevance Issues in Search’ for a much more structured discussion.