I’m working on a project that entails photographing text (from any hard copy of

Question

0

Asked: June 13, 20262026-06-13T14:50:38+00:00 2026-06-13T14:50:38+00:00

I’m working on a project that entails photographing text (from any hard copy of

0

I’m working on a project that entails photographing text (from any hard copy of text) and converting that text into a text file. Then I’d like to use that text file to do some different things, such as provide hyperlinks to news articles or allow the user to edit the document.

The tool I’ve tried so far is Java OCR from sourceforge.net, which works fine on the images provided in the package. But when I photograph my own text, it doesnt work at all. Is there some training process I should be implementing? If so, does anybody know how to implement it? Any help will go a long way. Thank you!

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-13T14:50:39+00:00

I have a java application where I ended up deciding to use Tesseract OCR, and just call out to it using Runtime.exec(). Perhaps not quite the answer you need, but just in case you’d not considered it.

Edit + code added in response to comment reply

On a Windows installation I think I was able to use an installer, or unzip a ready made binary.

On a Linux server, I needed to compile Tesseract myself, but it’s not too hard if you’re used to that kind of thing (gcc); the only gotcha is that there’s a dependency on Leptonica which also needs to be compiled.

// Tesseract can only handle .tif format, so we have to convert it
ImageIO.write( ImageIO.read( new java.io.File(file.getPath())), "tif", tmpFile[0]);

String[] tesseractCmd = new String[]{"tesseract", tmpFile[0].getAbsolutePath(), StringUtils.removeEnd(tmpFile[1].getAbsolutePath(), ".txt")};
final Process process = Runtime.getRuntime().exec(tesseractCmd);
try {
    int exitValue = process.waitFor();
    if(exitValue == 0) {
        final String extractedText = SearchableTextExtractionUtils.extractPlainText(new FileReader(tmpFile[1]));
        return extractedText;
    }
    throw new SearchableTextExtractionException(exitValue, Arrays.toString(tesseractCmd));
} catch (InterruptedException e) {
    throw new SearchableTextExtractionException(e);
} finally {
    process.destroy();
}

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I’m working on a project that entails photographing text (from any hard copy of

Leave an answerCancel reply

1 Answer

Edit + code added in response to comment reply

Leave an answer
Cancel reply