I want to get existing OCR data in .tif files using Java. This OCR data is created using MS Office Document Image Writer. I have searched a little bit open source libraries but I couldn’t find any library/tool which can retrieve/read attached OCR data.
How to get this OCR data in .tif files using Java?
OCR Data which is created using MS Office Document Image Writer and the (other) Metadata can be retrieved using ExifTool.
Example:
You can parse some data from outputLine and store in an object to use for further handling, as example to save in a database.