I need to extract/crop the logotype (BEAVER) in the middle from a TIFF file that looks like this: http://i41.tinypic.com/2i7rbie.jpg
And then I need to automate the process so it can be repeated about 9 million times…
My guess is that I would have to use some OCR software. But is it possible for such a software to “crop anything that starts below this point and ends above this point”?
Thoughts?
Typically OCR software does only extraction of text from images and conversion of it into some text-specific format. It does not do crop. However, you can use OCR technologies to achieve your task. I would recommend following:
Real challenge is in the amount of text you would like to process. You have to be very carefull when defining your “smart rules” to make sure they don’t provide false positives and always send suspicious images to separate queue that you will later manually review and update your rules.
In general it may look like this:
Most likely you will encounter some strange images that either contradict existing rules, or just wrong. Not always you have to update your rules to accomodate it. It may happen that there it only dozen of images like that in whole your 9 million collection. It might be better to leave them in exceptions queue for manual processing, and don’t risk stability of your magic rules.