Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • Home
  • SEARCH
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 7603793
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 30, 20262026-05-30T23:44:40+00:00 2026-05-30T23:44:40+00:00

I am trying to implement a Digit Recognition OCR in OpenCV-Python (cv2). It is

  • 0

I am trying to implement a “Digit Recognition OCR” in OpenCV-Python (cv2). It is just for learning purposes. I would like to learn both KNearest and SVM features in OpenCV.

I have 100 samples (i.e. images) of each digit. I would like to train with them.

There is a sample letter_recog.py that comes with OpenCV sample. But I still couldn’t figure out on how to use it. I don’t understand what are the samples, responses etc. Also, it loads a txt file at first, which I didn’t understand first.

Later on searching a little bit, I could find a letter_recognition.data in cpp samples. I used it and made a code for cv2.KNearest in the model of letter_recog.py (just for testing):

import numpy as np
import cv2

fn = 'letter-recognition.data'
a = np.loadtxt(fn, np.float32, delimiter=',', converters={ 0 : lambda ch : ord(ch)-ord('A') })
samples, responses = a[:,1:], a[:,0]

model = cv2.KNearest()
retval = model.train(samples,responses)
retval, results, neigh_resp, dists = model.find_nearest(samples, k = 10)
print results.ravel()

It gave me an array of size 20000, I don’t understand what it is.

Questions:

1) What is letter_recognition.data file? How to build that file from my own data set?

2) What does results.reval() denote?

3) How we can write a simple digit recognition tool using letter_recognition.data file (either KNearest or SVM)?

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-30T23:44:42+00:00Added an answer on May 30, 2026 at 11:44 pm

    Well, I decided to workout myself on my question to solve the above problem. What I wanted is to implement a simple OCR using KNearest or SVM features in OpenCV. And below is what I did and how. (it is just for learning how to use KNearest for simple OCR purposes).

    1) My first question was about letter_recognition.data file that comes with OpenCV samples. I wanted to know what is inside that file.

    It contains a letter, along with 16 features of that letter.

    And this SOF helped me to find it. These 16 features are explained in the paper Letter Recognition Using Holland-Style Adaptive Classifiers.
    (Although I didn’t understand some of the features at the end)

    2) Since I knew, without understanding all those features, it is difficult to do that method. I tried some other papers, but all were a little difficult for a beginner.

    So I just decided to take all the pixel values as my features. (I was not worried about accuracy or performance, I just wanted it to work, at least with the least accuracy)

    I took the below image for my training data:

    enter image description here

    (I know the amount of training data is less. But, since all letters are of the same font and size, I decided to try on this).

    To prepare the data for training, I made a small code in OpenCV. It does the following things:

    1. It loads the image.
    2. Selects the digits (obviously by contour finding and applying constraints on area and height of letters to avoid false detections).
    3. Draws the bounding rectangle around one letter and wait for key press manually. This time we press the digit key ourselves corresponding to the letter in the box.
    4. Once the corresponding digit key is pressed, it resizes this box to 10×10 and saves all 100 pixel values in an array (here, samples) and corresponding manually entered digit in another array(here, responses).
    5. Then save both the arrays in separate .txt files.

    At the end of the manual classification of digits, all the digits in the training data (train.png) are labeled manually by ourselves, image will look like below:

    enter image description here

    Below is the code I used for the above purpose (of course, not so clean):

    import sys
    
    import numpy as np
    import cv2
    
    im = cv2.imread('pitrain.png')
    im3 = im.copy()
    
    gray = cv2.cvtColor(im,cv2.COLOR_BGR2GRAY)
    blur = cv2.GaussianBlur(gray,(5,5),0)
    thresh = cv2.adaptiveThreshold(blur,255,1,1,11,2)
    
    #################      Now finding Contours         ###################
    
    contours,hierarchy = cv2.findContours(thresh,cv2.RETR_LIST,cv2.CHAIN_APPROX_SIMPLE)
    
    samples =  np.empty((0,100))
    responses = []
    keys = [i for i in range(48,58)]
    
    for cnt in contours:
        if cv2.contourArea(cnt)>50:
            [x,y,w,h] = cv2.boundingRect(cnt)
            
            if  h>28:
                cv2.rectangle(im,(x,y),(x+w,y+h),(0,0,255),2)
                roi = thresh[y:y+h,x:x+w]
                roismall = cv2.resize(roi,(10,10))
                cv2.imshow('norm',im)
                key = cv2.waitKey(0)
    
                if key == 27:  # (escape to quit)
                    sys.exit()
                elif key in keys:
                    responses.append(int(chr(key)))
                    sample = roismall.reshape((1,100))
                    samples = np.append(samples,sample,0)
    
    responses = np.array(responses,np.float32)
    responses = responses.reshape((responses.size,1))
    print "training complete"
    
    np.savetxt('generalsamples.data',samples)
    np.savetxt('generalresponses.data',responses)
    

    Now we enter in to training and testing part.

    For the testing part, I used the below image, which has the same type of letters I used for the training phase.

    enter image description here

    For training we do as follows:

    1. Load the .txt files we already saved earlier
    2. create an instance of the classifier we are using (it is KNearest in this case)
    3. Then we use KNearest.train function to train the data

    For testing purposes, we do as follows:

    1. We load the image used for testing
    2. process the image as earlier and extract each digit using contour methods
    3. Draw a bounding box for it, then resize it to 10×10, and store its pixel values in an array as done earlier.
    4. Then we use KNearest.find_nearest() function to find the nearest item to the one we gave. ( If lucky, it recognizes the correct digit.)

    I included last two steps (training and testing) in single code below:

    import cv2
    import numpy as np
    
    #######   training part    ############### 
    samples = np.loadtxt('generalsamples.data',np.float32)
    responses = np.loadtxt('generalresponses.data',np.float32)
    responses = responses.reshape((responses.size,1))
    
    model = cv2.KNearest()
    model.train(samples,responses)
    
    ############################# testing part  #########################
    
    im = cv2.imread('pi.png')
    out = np.zeros(im.shape,np.uint8)
    gray = cv2.cvtColor(im,cv2.COLOR_BGR2GRAY)
    thresh = cv2.adaptiveThreshold(gray,255,1,1,11,2)
    
    contours,hierarchy = cv2.findContours(thresh,cv2.RETR_LIST,cv2.CHAIN_APPROX_SIMPLE)
    
    for cnt in contours:
        if cv2.contourArea(cnt)>50:
            [x,y,w,h] = cv2.boundingRect(cnt)
            if  h>28:
                cv2.rectangle(im,(x,y),(x+w,y+h),(0,255,0),2)
                roi = thresh[y:y+h,x:x+w]
                roismall = cv2.resize(roi,(10,10))
                roismall = roismall.reshape((1,100))
                roismall = np.float32(roismall)
                retval, results, neigh_resp, dists = model.find_nearest(roismall, k = 1)
                string = str(int((results[0][0])))
                cv2.putText(out,string,(x,y+h),0,1,(0,255,0))
    
    cv2.imshow('im',im)
    cv2.imshow('out',out)
    cv2.waitKey(0)
    

    And it worked, below is the result I got:

    enter image description here


    Here it worked with 100% accuracy. I assume this is because all the digits are of the same kind and the same size.

    But anyway, this is a good start to go for beginners (I hope so).

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I'm trying to learn Django and I would like feedback from anyone who has
I am trying to implement string unescaping with Python regex and backreferences, and it
I'm trying to implement long division for bignums. I can't use a library like
i am trying to implement RSA in python(i am new to python) for my
I'm trying to create a simple OCR application with SVM, openCV, C++ and Visual
I'm trying to implement a Facebook Like button for a website. I'm working in
Like the title says, I'm trying to implement the programmatic parts of RFC4226 HOTP:
Trying to implement, in Scala, the following Haskell function (from Learn You a Haskell...)
All I am currently trying implement something along the lines of dim l_stuff as
trying to implement a dialog-box style behaviour using a separate div section with all

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.