I’m attempting to parse layout information from OCR engines with PHP, except they are

Question

0

Asked: May 27, 20262026-05-27T08:13:24+00:00 2026-05-27T08:13:24+00:00

I’m attempting to parse layout information from OCR engines with PHP, except they are

0

I’m attempting to parse layout information from OCR engines with PHP, except they are not giving any details.

I have both Tesseract (with Leptonica) and Cuneiform installed. Supposedly Cuneiform is excellent at detecting layout (i.e. what is text, what is a picture, etc.) Input are PNG files with both text and images (obviously the text is part of the image.)

They all seem to think I want the output as txt or html or hocr… when what I want are the coordinates of what it thinks is text and what it thinks is an image.

Cuneiform has a “native” output option which is Cuneiform 2000 format, opening it up in Notepad++ I can see that it’s compressed. I’ve tried extracting it with zip and gzip but neither recognize it. No info on Google about the native Cuneiform format either.

Anyone got any idea how to extract the layout information from Tesseract or Cuneiform… or got any better ideas to figure out the layout of an image containing text blocks and pictures?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-27T08:13:24+00:00

Have a look at ABBYY FineReader Engine. It has a very smart API that provides maximum information about the recoggnized text, including its coordinates. It’s not free, but when it comes to business software – ABBYY OCR technologies can add a serious value to your product.

Since you are working on a web application in PHP, you may want to use ABBYY OCR Engine web API at http://www.ocrsdk.com. It’s now in closed beta, so for now it’s free to use.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I’m attempting to parse layout information from OCR engines with PHP, except they are

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply