I’m using the rord() function in Solr queries in order to boost query results against a “rank” field, using a syntax something like this:
bf=rord(cur_rank)^1.8
The algorithm works well, but recent changes in Solr indicate that using ord() and rord() is a memory hog now. From the changelog:
Searching and sorting is now done on a
per-segment basis, meaning that the
FieldCache entries used for sorting
and for function queries are created
and used per-segment and can be reused
for segments that don’t change between
index updates. While generally
beneficial, this can lead to increased
memory usage over 1.3 in certain
scenarios:[…]
2) Certain function queries
such as ord() and rord() require a top
level FieldCache instance and can thus
lead to increased memory usage.
Consider replacing ord() and rord()
with alternatives, such as function
queries based on ms() for date
boosting.
It mentions possible strategies for handling date-based boosting, but how about for a number like “rank” where rank is a number between 1 and the total number of records?
rord() seems ideal… any other strategies?
The point of using segment-based field caches is to reduce the load time. If you want to get the value of a field after having added a new segment (which is done every time you commit), you only have to load a new field cache for the newly added segment.
This is not possible with ord and rord which give you the ordinal for the whole index instead of the value for a single document.
So the only solution for you would be to compute the boost based the value of the field “cur_rank” instead of its ord.
This is how date boosting now works : it used to use the rord of the date field in order to compute the boost, whereas it now uses the number of milliseconds between the value of the date field and now. See http://wiki.apache.org/solr/SolrRelevancyFAQ (“How can I boost the score of newer documents”) for more details.