For my project i am writing an image pre processing library for scanned documents. As of now I am stuck with line removal feature.
Problem Description:
A sample scanned form:
Name* : ______________________________
Age* : ______________________________
Email-ID: |_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|
Note:
Following are the further conditions:
- The scanned document may contain many more vertical and horizontal guiding lines.
- Thickness of the lines may exceed 1px
- The document itself is not printed properly and might have noise in the form of ink bloating or uneven thickness
- The document might have colored background or lines
Now what I am trying to do is to detect these lines and remove them. And while doing so the hand written content should not be lost.
Solution so for:
The current solution is implemented in Java.
Detected these lines by using a combination of canny/sobel edge detectors and a threshold filter(to make image bitonal). From the previous action I get a black and white array of pixels. Traverse the array and check whether lumanicity of that pixel falls below a specified bin value. And if I found 30 (minimum line length in pixels) such pixels, I remove them. I repeat the same for vertical lines but considering the fact there will be cuts due to horizontal line removal.
Although the solution seems to work. But there are problems like,
- Removal of overlapping characters
- If characters in the image are not properly spaced then it is also
considered as a line. - The output image from edge detection is in black and white.
- A bit slow. Normally takes around 40 seconds for image of 2480*3508.
Kindly guide how to do it properly and efficiently. And if there is an opensource library then please direct.
Thanks
First, I want to mention that I know nothing about image processing in general, and about OCR in particular.
Still, a very simple heuristic comes to my mind:
The only problem I can see is, if somebody writes letters on a horizontal line, like so:
In that case the line would remain, but you have to handle this case anyhow.
As I mentioned, I’m by no means an image processing expert, but sometimes very simple tricks work.