I’m trying to find the fastest way to find the first non-zero value for each row of a two dimensional sorted array. Technically, the only values in the array are zeros and ones, and it is “sorted”.
For instance, the array could look like the following:
v =
0 0 0 1 1 1 1
0 0 0 1 1 1 1
0 0 0 0 1 1 1
0 0 0 0 0 0 1
0 0 0 0 0 0 1
0 0 0 0 0 0 1
0 0 0 0 0 0 0
I could use the argmax function
argmax(v, axis=1))
to find when it changes from zero to one, but I believe this would do an exhaustive search along each row. My array will be reasonably sized (~2000×2000). Would argmax still outperform just doing a searchsorted approach for each row within a for loop, or is there a better alternative?
Also, the array will always be such that the first position of a one for a row is always >= the first position of a one in the row above it (but it is not guaranteed that there will be a one in the last few rows). I could exploit this with a for loop and a “starting index value” for each row equal to the position of the first 1 from the previous row, but am i correct in thinking that the numpy argmax function will still outperform a loop written in python.
I would just benchmark the alternatives, but the edge length of the array could change quite a bit (from 250 to 10,000).
It is reasonably fast to use np.where:
That delivers tuples with to coordinates of the values greater than 0.
You can also use np.where to test each sub array:
Prints:
ie, row 0: index 3>0; row 4: index 4>0; row 6: no index greater than 0
As you suspect, argmax may be faster:
If you can deal with the logic of not having a
Nonefor rows of all naughts, this is faster still:And here is a version that uses axis in argmax (as suggested in your comments):
For speed comparisons (on your example array), I get this:
If I scale that to a 2000 X 2000 np array, here is what I get: