I’ve got a numpy array containing labels. I’d like to get calculate a number for each label based on its size and bounding box. How can I write this more efficiently so that it’s realistic to use on large arrays (~15000 labels)?
A = array([[ 1, 1, 0, 3, 3],
[ 1, 1, 0, 0, 0],
[ 1, 0, 0, 2, 2],
[ 1, 0, 2, 2, 2]] )
B = zeros( 4 )
for label in range(1, 4):
# get the bounding box of the label
label_points = argwhere( A == label )
(y0, x0), (y1, x1) = label_points.min(0), label_points.max(0) + 1
# assume I've computed the size of each label in a numpy array size_A
B[ label ] = myfunc(y0, x0, y1, x1, size_A[label])
I wasn’t really able to implement this efficiently using some NumPy vectorised functions, so maybe a clever Python implementation will be faster.
This function returns a dictionary mapping each label to the index of the first row it appears in. Applying the function to
A,A.T,A[::-1]andA.T[::-1]also gives you the first column as well as the last row and column.If you would rather like a list instead of a dictionary, you can turn the dictionary into a list using
map(d.get, labels). Alternatively, you can use a NumPy array instead of a dictionary right from the start, but you will lose the ability to leave the loop early as soon as all labels were found.I’d be interested whether (and how much) this actually speeds up your code, but I’m confident that it is faster than your original solution.