How can I convert PDF files to HTML with Python?
I was thinking something alone the lines of what Google does (or seems to do) to index PDF files.
My final goal is to setup Apache to show the HTML for the PDF files, so anything leading me in that direction would also be appreciated.
The poppler package provides a pdf2html utility that you might be able to use. There is also a Python binding to libpoppler.