I have tons PDFs that I need to convert to some structured format that

Question

0

Editorial Team

Asked: June 12, 20262026-06-12T03:59:38+00:00 2026-06-12T03:59:38+00:00

I have tons PDFs that I need to convert to some structured format that

0

I have tons PDFs that I need to convert to some structured format that I can interpret (HTML/XML/etc)

PDFs are in this format:
http://img840.imageshack.us/img840/5407/pdfv.png

I have tried so far a lot of softwares that convert to HTML but all of them have no capabilities to separate the images, they just take like a printscreen of the page without the text and then use this image as a background in the html, using css to position the text

Like this: http://img37.imageshack.us/img37/5015/examplelp.jpg

I have a bunch of PDFs so process each ones images manually is not an option. Does anyone knows any solution for this (even paid softwares)?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-12T03:59:39+00:00

Editorial Team

2026-06-12T03:59:39+00:00Added an answer on June 12, 2026 at 3:59 am

I had a similar problem a while back and ended up writing my own solution. It’s called PDFX and it’s free to use. It converts PDF to a structured-format XML and also renders any bitmap images (not vector graphics) found in the PDF separately.

Example input/output can be found here. You might want to give it a try.

0

Reply
Share
Share

- Report

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I have tons PDFs that I need to convert to some structured format that

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply