I am generating a bunch of N normal rvs (mean 0 sd 1) with numpy and then taking the standard deviation of the sample with ddof = 1 which should presumably give me an unbiased estimator. The process is roughly as follows:
def genData(samples = 20, mean = 333.8, sd = 3.38):
bl = scipy.stats.norm.rvs(loc = mean, scale = sd, size = samples)
return [np.mean(bl), np.std(bl, ddof = 1)]
means = {}
sds = {}
n = 50000
for size in range(5,21):
x = [genData(size, mean = 0, sd = 1) for x in range(n)]
means[size] = map(lambda d: d[0], x)
sds[size] = map(lambda d: d[1], x)
However, I observe the following KDEs instead:
ddof = 1

ddof = 2

Pardon the rough curves due to small sample size.
There is clear bias with ddof = 1 which is eliminated with ddof = 2. What am I doing wrong here?
The square root of an unbiased estimator of variance is not necessarily an unbiased estimator of the square root of the variance. In mathematical terms, sum[(s-u)²]/(N-1) is an unbiased estimator of the variance V even though sqrt{sum[(x-u)²]/(N-1)} is not an unbiased estimator of sqrt(V).
This is actually on scipy’s documentation: link (see the “Notes” section).