I’m working with a page segmentation algorithm. The output of the code writes an image with the pixels of each zone assigned a unique color. I’d like to process the image to find the bounding boxes of the zones. I need to find all the colors, then find all the pixels of that color, then find their bounding box.
The following is an example image.

I’m currently starting with histograms of the R,G,B channels. The histograms tell me data locations.
img = Image.open(imgfilename)
img.load()
r,g,b = img.split()
ra,ga,ba = [ np.asarray(p,dtype="uint8") for p in (r,g,b) ]
rhist,edges = np.histogram(ra,bins=256)
ghist,edges = np.histogram(ga,bins=256)
bhist,edges = np.histogram(ba,bins=256)
print np.nonzero(rhist)
print np.nonzero(ghist)
print np.nonzero(bhist)
Output:
(array([ 0, 1, 128, 205, 255]),)
(array([ 0, 20, 128, 186, 255]),)
(array([ 0, 128, 147, 150, 255]),)
I’m a little flummoxed at this point. By visual inspection, I have colors (0,0,0),(1,0,0),(0,20,0),(128,128,128),etc. How should I permute the nonzero outputs into pixel values for np.where()?
I’m considering flattening the 3,row,col narray into a 2-D plane of 24-bit packed RGB values (r<<24|g<<16|b) and searching that array. That seems brute force and inelegant. Is there a better way in Numpy to find bounding boxes of a color value?
There is no reason to consider this as a RGB color image, it is simply a visualization of a segmentation that someone else did. You can easily consider it as a grayscale image, and for these specific colors you don’t have to do anything else yourself.
If you cannot rely on
convert('L')giving unique colors (i.e., you are using other colors beyond the ones in the given image), you can pack your image and obtain the unique colors:I would also recommend removing small connected components before finding the bounding boxes, by the way.