I’m currently at the point where I can convert Bitmap into byte arrays. Suppose I have 26 images representing a-z with 26 corresponding byte arrays. Given an image I would like to use the byte array to instantly lookup the correct letter rather than performing up to 26 comparisons. Is there some way of hashing the byte arrays to produce a hash code that can be stored in a configuration file?
Alternatively if there is a better (faster) approach than hashing the images (assuming I have no access to the underlying textual representation) I would very much like to know about them. For clarification purposes suppose I have “a.bmp”, “b.bmp” etc. I now have an unknown image on the screen. I would have thought hashing the image and performing a single lookup would be the fastest way for a positive identification. It should be faster than performing up to 26 individual comparisons. If this assumption is incorrect, I would appreciate an outline of the optimal method.
Note: It’s not a classic OCR problem (handwriting recognition etc) because the letters will be rendered identically every time. Therefore the letter “a” will always produce exactly the same hash code
You can find a C# algorithm to hash an array of bytes here. You can then use a C# hash table datatype to map the hash to the character. However, you would still need to scan every byte of every bitmap, so the operation is O(B * N) where B is the number of bytes in the bitmap and N is the number of characters. Not particularly efficient given the size of typical bitmaps.
However, if this is OCR (optical character recognition) this hash function will be absolutely useless. The value of the hash changes greatly even if one pixel is different, so typical optical noise from scanners or digital cameras would prevent two pictures of the same character from hashing identically. There are programmatic OCR techniques out there, but that is an extremely deep topic and you’re much better off using a pre-built library if this is an OCR problem.