So I have what is essentially a spreadsheet in TIFF format. There is some uniformity to it…for example, all the column widths are the same. I want to de-limit this sheet by those known-column widths and basically create lots of little graphic files, one for each cell, and run OCR on them and store it into a database. The problem is that the horizontal lines are not all the same height, so I need to use some kind of graphics library command to check if every pixel across is the same color (i.e. black). And if so, then I know I’ve reached the height-delimiter for a cell. How would I go about doing that? (I’m using RMagick)
Share
Use
image#get_pixel: http://www.simplesystems.org/RMagick/doc/image2.html#get_pixelsWarning: Those docs are old, so it may have changed in the newer versions. Look at your own rdocs using
$ gem server, assuming they have rdocs.image#rowsgives you the height of the image, then you can do something like (untested):Please keep in mind that I’m not sure about the api. Looking at older docs, and I can’t test it now. But it looks like the general approach you would take. BTW, it assumes the row borders are 1 pixel thick. If not, change the
1to the actual thickness and that might be enough to make it work like you expect.