We presently have Solr deployed across multiple servers on our image sharing site. We have 10 million images, with 1/4 million added monthly.
So far Solr does a very good job at selecting search results, but we think there is room to improve on sorting/ranking. We think that incorporating click-through rates in the rankings would provide significantly improved results.
We presently collect click-through data via MongoDB. We record how many times an image is clicked on vs. how many times it is displayed, per term. So for example:
[image identifier], [search term], [click-through rate]
"00000001", "banana peel", "0.1565"
"00000001", "banana", "0.0216"
"00000001", "monkey banana", "0.0087"
What we want to do is find a way to incorporate this search-term-specific click-through data into our Solr rankings. The more an image has been clicked on for that same term, the higher it will rank. We have yet been able to find a way to do this cleanly.
We would like to load the data externally, and not have it as part of the Solr index – as we’d like the click-through data to be nearly real-time and would like to keep our Solr catalogue from getting too huge.
Any ideas or thoughts would be very much appreciated!
ExternalFileField is the most obvious solution.
Also check http://www.slideshare.net/LucidImagination/bialecki-andrzej-clickthroughrelevancerankinginsolrlucidworksenterprise-8419715 for some more background on this issue.