I’m working with a page segmentation algorithm. The output of the code writes an

Question

0

Asked: June 17, 20262026-06-17T23:23:45+00:00 2026-06-17T23:23:45+00:00

I’m working with a page segmentation algorithm. The output of the code writes an

0

I’m working with a page segmentation algorithm. The output of the code writes an image with the pixels of each zone assigned a unique color. I’d like to process the image to find the bounding boxes of the zones. I need to find all the colors, then find all the pixels of that color, then find their bounding box.

The following is an example image.

Example output image showing colored zones

I’m currently starting with histograms of the R,G,B channels. The histograms tell me data locations.

img = Image.open(imgfilename)
img.load()
r,g,b = img.split()

ra,ga,ba = [ np.asarray(p,dtype="uint8") for p in (r,g,b) ]

rhist,edges = np.histogram(ra,bins=256)
ghist,edges = np.histogram(ga,bins=256)
bhist,edges = np.histogram(ba,bins=256)
print np.nonzero(rhist)
print np.nonzero(ghist)
print np.nonzero(bhist)

Output:
(array([ 0, 1, 128, 205, 255]),)
(array([ 0, 20, 128, 186, 255]),)
(array([ 0, 128, 147, 150, 255]),)

I’m a little flummoxed at this point. By visual inspection, I have colors (0,0,0),(1,0,0),(0,20,0),(128,128,128),etc. How should I permute the nonzero outputs into pixel values for np.where()?

I’m considering flattening the 3,row,col narray into a 2-D plane of 24-bit packed RGB values (r<<24|g<<16|b) and searching that array. That seems brute force and inelegant. Is there a better way in Numpy to find bounding boxes of a color value?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-17T23:23:46+00:00

There is no reason to consider this as a RGB color image, it is simply a visualization of a segmentation that someone else did. You can easily consider it as a grayscale image, and for these specific colors you don’t have to do anything else yourself.

import sys
import numpy
from PIL import Image

img = Image.open(sys.argv[1]).convert('L')

im = numpy.array(img) 
colors = set(numpy.unique(im))
colors.remove(255)

for color in colors:
    py, px = numpy.where(im == color)
    print(px.min(), py.min(), px.max(), py.max())

If you cannot rely on convert('L') giving unique colors (i.e., you are using other colors beyond the ones in the given image), you can pack your image and obtain the unique colors:

...
im = numpy.array(img, dtype=int)

packed = im[:,:,0]<<16 | im[:,:,1]<<8 | im[:,:,2]
colors = set(numpy.unique(packed.ravel()))
colors.remove(255<<16 | 255<<8 | 255)

for color in colors:
    py, px = numpy.where(packed == color)
    print(px.min(), py.min(), px.max(), py.max())

I would also recommend removing small connected components before finding the bounding boxes, by the way.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I’m working with a page segmentation algorithm. The output of the code writes an

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply