If I have this numpy array:
>>> a
array([[ 1, 2, 3],
[ 4, 4, 6],
[ 4, 10, 9]])
What’s the fastest way to select out of it all the rows where a condition holds true of at least N many elements?
For example select all the rows where at least two numbers are evenly divisible by 2.
The solution I came up with is:
@ find rows where 2 or more elements are evenly divisible by two
N = 2
a[where(array(map(lambda x: sum(x), a % 2 == 0)) >= N)]
An alternative solution using apply_along_axis is:
a[where(sum(numpy.apply_along_axis(lambda x: x % 2 == 0, 1, a), axis=1) >= 2)]
Is there a more elegant/faster way in numpy/scipy than these? If not, which of the above two is best?
I’d probably do
which works because True/False have integer values of 1/0. For comparison:
Note that using lambdas costs you a lot of the benefits of using numpy in the first place, and
lambda x: sum(x)is simply a more verbose and slower way of writingsumhere anyway.Also note that if the array were large, it’d probably be more efficient to use a method which could short-circuit rather than the above.