When we train a ctr(click through rate) model, sometimes we need calcute the real ctr from the history data, like this
#(click)
ctr = ----------------
#(impressions)
We know that, if the number of impressions is too small, the calculted ctr is not real. So we always set a threshold to filter out the large enough impressions.
But we know that the higher impressions, the higher confidence for the ctr. Then my question is that: Is there a impressions-normalized statistic method to calculate the ctr?
Thanks!
You probably need a representation of confidence interval for your estimated ctr. Wilson score interval is a good one to try.
You need below stats to calculate the confidence score:
\hat pis the observed ctr (fraction of #clicked vs #impressions)nis the total number of impressionszα/2 is the(1-α/2)quantile of the standard normal distributionA simple implementation in python is shown below, I use
z(1-α/2)=1.96 which corresponds to a 95% confidence interval. I attached 3 test results at the end of the code.Now you can set up some threshold to use the calculated confidence interval.