I’m writing an OCR application to read characters from a screenshot image. Currently, I’m

Question

0

Asked: May 27, 20262026-05-27T19:35:33+00:00 2026-05-27T19:35:33+00:00

I’m writing an OCR application to read characters from a screenshot image. Currently, I’m

0

I’m writing an OCR application to read characters from a screenshot image. Currently, I’m focusing only on digits. I’m partially basing my approach on this blog post: http://blog.damiles.com/2008/11/basic-ocr-in-opencv/.

I can successfully extract each individual character using some clever thresholding. Where things get a bit tricky is matching the characters. Even with fixed font face and size, there are some variables such as background color and kerning that cause the same digit to appear in slightly different shapes. For example, the below image is segmented into 3 parts:

Top: a target digit that I successfully extracted from a screenshot
Middle: the template: a digit from my training set
Bottom: the error (absolute difference) between the top and middle images

The parts have all been scaled (the distance between the two green horizontal lines represents one pixel).

topbottommiddle

You can see that despite both the top and middle images clearly representing a 2, the error between them is quite high. This causes false positives when matching other digits — for example, it’s not hard to see how a well-placed 7 can match the target digit in the image above better than the middle image can.

Currently, I’m handling this by having a heap of training images for each digit, and matching the target digit against those images, one-by-one. I tried taking the average image of the training set, but that doesn’t resolve the problem (false positives on other digits).

I’m a bit reluctant to perform matching using a shifted template (it’d be essentially the same as what I’m doing now). Is there a better way to compare the two images than simple absolute difference? I was thinking of maybe something like the EMD (earth movers distance, http://en.wikipedia.org/wiki/Earth_mover‘s_distance) in 2D: basically, I need a comparison method that isn’t as sensitive to global shifting and small local changes (pixels next to a white pixel becoming white, or pixels next to a black pixel becoming black), but is sensitive to global changes (black pixels that are nowhere near white pixels become black, and vice versa).

Can anybody suggest a more effective matching method than absolute difference?

I’m doing all this in OpenCV using the C-style Python wrappers (import cv).

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-27T19:35:34+00:00

Editorial Team

2026-05-27T19:35:34+00:00Added an answer on May 27, 2026 at 7:35 pm

I would look into using Haar cascades. I’ve used them for face detection/head tracking, and it seems like you could build up a pretty good set of cascades with enough ‘2’s, ‘3’s, ‘4’s, and so on.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I’m writing an OCR application to read characters from a screenshot image. Currently, I’m

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply