sorted([2, float('nan'), 1]) returns [2, nan, 1]
(At least on Activestate Python 3.1 implementation.)
I understand nan is a weird object, so I wouldn’t be surprised if it shows up in random places in the sort result. But it also messes up the sort for the non-nan numbers in the container, which is really unexpected.
I asked a related question about max, and based on that I understand why sort works like this. But should this be considered a bug?
Documentation just says “Return a new sorted list […]” without specifying any details.
EDIT:
I now agree that this isn’t in violation of the IEEE standard. However, it’s a bug from any common sense viewpoint, I think. Even Microsoft, which isn’t known to admit their mistakes often, has recognized this one as a bug, and fixed it in the latest version: http://connect.microsoft.com/VisualStudio/feedback/details/363379/bug-in-list-double-sort-in-list-which-contains-double-nan.
Anyway, I ended up following @khachik’s answer:
sorted(list_, key = lambda x : float('-inf') if math.isnan(x) else x)
I suspect it results in a performance hit compared to the language doing that by default, but at least it works (barring any bugs that I introduced).
The previous answers are useful, but perhaps not clear regarding the root of the problem.
In any language, sort applies a given ordering, defined by a comparison function or in some other way, over the domain of the input values. For example, less-than, a.k.a.
operator <,could be used throughout if and only if less than defines a suitable ordering over the input values.But this is specifically NOT true for floating point values and less-than:
“NaN is unordered: it is not equal to, greater than, or less than anything, including itself.” (Clear prose from GNU C manual, but applies to all modern
IEEE754based floating point)So the possible solutions are:
Either approach can be used, in any language.
Practically, considering python, I would prefer to remove the NaNs if you either don’t care much about fastest performance or if removing NaNs is a desired behavior in context.
Otherwise you could use a suitable predicate function via “cmp” in older python versions, or via this and
functools.cmp_to_key(). The latter is a bit more awkward, naturally, than removing the NaNs first. And care will be required to avoid worse performance, when defining this predicate function.