I’m curious how to do normalizing of numbers for a ranking algorithm
let’s say I want to rank a link based on importance and I have two columns to work with
so a table would look like
url | comments | views
now I want to rank comments higher than views so I would first think to do comments*3 or something to weight it, however if there is a large view number like 40,000 and only 4 comments then the comments weight gets dropped out.
So I’m thinking I have to normalize those scores down to a more equal playing field before I can weight them. Any ideas or pointers to how that’s usually done?
thanks
For each url, you could first normalize the comments and views to a percentile. For example,
Then you could assign weights to each of the percentile values to compute the overall score.
Additional strategies may involve eliminating outliers if the values cluster toward one end of the range.