I am trying to write a document that can only be read by humans. The document content can’t be copied. For that purpose, I am converting its pages to pictures and adding them back to a PDF file. The main issue is that any OCR program can get back the whole written text, especially that the page is going to be clear (as opposed to a scanned book) which will increase the OCR accuracy.
So, is there a font that can’t be recognized by an OCR. Otherwise, is there a technique that will make my document only readable by humans, yet unrecognised by an OCR? (for instance, adding a specific background, etc…)
Thank you in advance.
In general OCR does not recognizes text by identifying their ‘fonts’, instead they do it by analyzing the features and shapes of characters, means it looks for similarities in the figure open areas, shapes of the different texts, and letters in the file being scanned for conversion. (That’s why it can also recognizes handwritten documents which are not using any fonts for that matter).
This process of identifying text through their feature is knows as
Intelligent Character RecognitionI don’t think there can be a certain answer to your question that which font to use to make it unreadable by OCR but just to make it a more harder for a general OCR try using some calligraphic fonts like this one which doesn’t follow general character features, hence hard for computers software to read (this is also the main idea behind CAPTCHA).
But again this may give a general OCR a hard time but still it’s not 100% successful solution, plus it will also make it really hard for any human to read.