I put together a quick WinForm/embedded IE browser control which logs into our company’s bank website each morning and scrapes/exports the desired deposit information (the bank is a smallish regional bank). Since we have a few dozen “pseudoaccounts” that draw from the same master account, this actually takes 10-15 minutes to retrieve.
Anyway, the only problem is that our business bank account reuires an RSA security token (http://www.rsa.com/node.aspx?id=1156)–if you are not familiar, it is a small device which shows a random 6 digit number every 15(?) seconds, so I have to prompt for this value before starting. This is on top of the website’s login based security model, so even if you create a read-only account that can’t do anything, you still have to put the RSA number in. We have 5 of these tokens for different people in the company.
From our perspective this is nusiance security. I was joking about using a web camera to OCR the digits from the key fob so they didn’t have to type it in — mainly so that the scraping/export would be done before anyone arrives in the morning. Well, they asked if I could really do it.
So now I ask you, how hard (how many hours) do you think it would take to OCR these digits reliably from a JPEG image produced by the camera? I already know I can get the JPEG easily. I think you get 3 tries to log in, so it really needs to hit a 99% accuracy rate. I could work on this on my off time, but they don’t want me to put more than a few hours into it, so I want to leverage as much existing code as possible. This is a 7-segment display (like an alarm clock) so it’s not exactly text that an OCR package would be used to seeing.
Also–there is a countdown timer on the side of the display; typically when it is down to 1 bar, you wait until the next number appears and it starts over at 5 bars (like signal strength on your cell phone). So this would need to be OCRd as well but it is not text.
Anyway the more I think about it as I type this, the less convinced I am that I can truly get this right, so maybe I should just work on it in my spare time?
This is actually easier than it might at first appear. I’ve used this technique in the past, based on the fact that the digits always look the same, and always appear in the same locations.
Just create ten little masks, one for each of the digits, and prepare a script that splits your one jpg image into pieces, one for each digit. Line up the camera once, then leave it like that. Now you have ten masks for 0-9, and the actual digits on the device. Multiply the pixel values in each mask by each digit, and choose the highest value in each case. That will tell you which mask each digit best fits, and you can use that to determine the digits.
Disclaimer: I don’t think this is a great idea for security reasons, as other commenters have pointed out.