In NumPy,
foo = np.array([[i+10*j for i in range(10)] for j in range(3)])
array([[ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14, 15, 16, 17, 18, 19],
[20, 21, 22, 23, 24, 25, 26, 27, 28, 29]])
filter = np.nonzero(foo > 100)#nothing matches
foo[:,filter]
array([], shape=(3, 2, 0), dtype=int64)
foo[:,0:0]
array([], shape=(3, 0), dtype=int64)
filter2 = np.nonzero(np.sum(foo,axis=0) < 47)
foo[:,filter2]
array([[[ 0, 1, 2, 3, 4, 5]],
[[10, 11, 12, 13, 14, 15]],
[[20, 21, 22, 23, 24, 25]]])
foo[:,filter2].shape
(3, 1, 6)
I have a ‘filter’ condition where I want to perform an operation on all rows for all matching columns, but if filter is an empty array, somehow my foo[:,filter] gets broadcast into a 3D array. Another example is with filter2 -> again, foo[:,filter2] gives me a 3D array when I am expecting the result of foo[:,(np.sum(foo,axis=0) < 47)]
Can someone explain what the proper use case of np.nonzero is compared to using booleans to find the correct columns/indices?
First,
foo[filter] == foo[filter.nonzero()]whenfilteris a Boolean array.To understand why you’re getting unexpected results you have to understand a little about how python does indexing. To do multidimensional indexing in python you can either use indices in
[], separated by commas or use a tuple. Sofoo[1, 2, 3]is the same asfoo[(1, 2, 3)]. With this in mind take a look at what happens when you dofoo[:, something]. I believe in your example you were trying to getfoo[:, something[0], something[1]], but instead you gotfoo[(slice[None], (something[0], something[1]))].This is all somewhat academic, because if you’re just using
filterfor indexing you probably don’t need to use nonzero, just use the boolean array as the index but if you need to, you can do something like: