I used masked arrays all the time in my work, but one problem I have is that the initialization of masked arrays is a bit clunky. Specifically, the ma.zeros() and ma.empty() return masked arrays with a mask that doesn’t match the array dimension. The reason I want this is so that if I don’t assign to a particular element of my array, it is masked by default.
In [4]: A=ma.zeros((3,))
...
masked_array(data = [ 0. 0. 0.],
mask = False,
fill_value = 1e+20)
I can subsequently assign the mask:
In [6]: A.mask=ones((3,))
...
masked_array(data = [-- -- --],
mask = [ True True True],
fill_value = 1e+20)
But why should I have to use two lines to initialize and array? Alternatively, I can ignore the ma.zeros() functionality and specify the mask and data in one line:
In [8]: A=ma.masked_array(zeros((3,)),mask=ones((3,)))
...
masked_array(data = [-- -- --],
mask = [ True True True],
fill_value = 1e+20)
But I think this is also clunky. I have trawled through the numpy.ma documentation but I can’t find a neat way of dealing with this. Have I missed something obvious?
Well, the mask in
ma.zerosis actually a special constant,ma.nomask, that corresponds tonp.bool_(False). It’s just a placeholder telling NumPy that the mask hasn’t been set.Using
nomaskactually speeds upnp.masignificantly: no need to keep track of where the masked values are if we know beforehand that there are none.The best approach is not to set your mask explicitly if you don’t need it and leave
np.maset it when needed (ie, when you end up trying to take the log of a negative number).Side note #1: to set the mask to an array of
Falsewith the same shape as your input, useThat’s easier to type. Note that it’s really the Python
False, notnp.ma.nomask… Similarly, usemask=Trueto force all your inputs to be masked (ie,maskwill be a boolndarrayfull ofTrue, with the same shape as thedata).Side note #2:
If you need to set the mask after initialization, you shouldn’t use an assignment to
.maskbut assign to the special valuenp.ma.masked, it’s safer: