The accepted answer to the question C++ Library for image recognition: images containing words to string recommended that you:
- Upsize/Downsize your input image to 300 DPI.
How would I do this… I was under the impression that DPI was for monitors, not image formats.
I think the more accurate term here is resampling. You want a pixel resolution high enough to support accurate OCR. Font size (e.g. in points) is typically measured in units of length, not pixels. Since 72 points = 1 inch, we need 300/72 pixels-per-point for a resolution of 300 dpi (‘pixels-per-inch’). That means a typical 12-point font has a height (or more accurately, base-line to base-line distance in single-spaced text) of 50 pixels.
Ideally, your source documents should be scanned at an appropriate resolution for the given font size, so that the font in the image is about 50 pixels high. If the resolution is too high/low, you can easily resample the image using a graphics program (e.g. GIMP). You can also do this programmatically through a graphics library, such as ImageMagick which has interfaces for many programming languages.