I’m trying to implement search result highlighting for pdfs in a web app. I have the original pdfs, and small png versions that are used in search results. Essentially I’m looking for an api like:
pdf_document.find_offsets('somestring') # => { top: 501, left: 100, bottom: 520, right: 150 }, { ... another box ... }, ...
I know it’s possible to get this information out of a pdf because Apple’s Preview.app implements this.
Need something that runs on Linux and ideally is open source. I’m aware you can do this with acrobat on windows.
Try to look at PdfLib TET http://www.pdflib.com/products/tet/
(it’s not free)
Fabrizio