I’m trying to convert programatically PDF to HTML. So far I’ve been using pdftohtml but our users are not happy with the results.
Here’s what I need :
-
I’m using Ruby on Rails, but any tool working on Unix would work as I can call it from the command line. But of course a nice gem or plugin would be perfect.
-
I’d prefer it to be open source
-
It needs to be able handle images
-
It would be nice if there was an option to discard images if needed
-
It needs to be stable
-
It needs to return html with a layout close to the original pdf (I’ve tried pdftohtml and the result is not that good in a lot of cases)
Here are a couple more alternatives to pdftohtml/xpdf: