I am comparing two distributions, such as: group1 = [ 0, 0, 0, 1,

Question

0

Editorial Team

Asked: June 16, 20262026-06-16T00:08:28+00:00 2026-06-16T00:08:28+00:00

I am comparing two distributions, such as: group1 = [ 0, 0, 0, 1,

0

I am comparing two distributions, such as:

group1 = [ 0, 0, 0, 1, 11, 11, 13, 12]

group2 = [ 0, 0, 0, 0, 5, 11, 18, 14]

My distributions don’t have a lot of elements, and I am not sure if chi-square is the best approach, but from what I read I think it is still the best of those tests which I have seen.

The problem is, that whichever chi-square I try, I am getting different results:

so that if I use:

import numpy as np

import scipy.stats.mstats as mst
mst.chisquare(np.array(group1), np.array(group2))

the answer will be: (8.874603174603175, 0.26178489290758555)

If I use:

import scipy.stats as stat
stat.chisquare(np.array(group1), np.array(group2))

I will get: (nan, nan)

And if I remove all the elements which are 0 in both groups so that my groups will now look as such:

group1 = [ 1, 11, 11, 13, 12]

group2 = [ 0, 5, 11, 18, 14]

using:

mst.chisquare(np.array(group1), np.array(group2))

will give me: (8.874603174603175, 0.06431137995249224)

I am very confused with this ambiguity. What is the true p-value for my distributions?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-16T00:08:29+00:00

I guess it is a bug in the scipy.stats.mstats module. mstats is supposed to handle masked arrays (arrays with invalid values) better than stats. However it seems that in this case it does not count correctly the number of degrees of freedom (DOF): The chi-square statistics (the first return value of chisquare) is the same before and after removing the zeros, so only DOF could change.

Note that after removing the 0s in both arrays you will still get infinities because to calculate chi-square statistics you have to divide by frequencies in group2 (group2 in you array, see Wikipedia). mstat removes these invalid values, but it won’t adapt the DOF accordingly (because there is less elements the dof should be decreased by the difference of elements).

I hope it clarifies it a bit. Please consider sending a bug report to scipy discussion list.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I am comparing two distributions, such as: group1 = [ 0, 0, 0, 1,

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply