I am extracting a pdf into images / swf and text with the help

Question

0

Asked: May 14, 20262026-05-14T01:51:25+00:00 2026-05-14T01:51:25+00:00

I am extracting a pdf into images / swf and text with the help

0

I am extracting a pdf into images / swf and text with the help of SWFTools and XPDF.. I am running these in a PDF script.

But now I am trying to go one step further and try to get the TOC from the PDF is it possible to extract this information?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-14T01:51:25+00:00

I found this with a little bit of searching. It looks rather promising.

PDFMiner: http://www.unixuser.org/~euske/python/pdfminer/index.html

Note: The tool is Python based, but you should be able to use the tool via shell access. Alternatively, you may be able to glean some useful info from the source code itself, as the project is open source.

From the Site:

dumppdf.py

dumppdf.py dumps the internal contents of a PDF file in pseudo-XML format. This program is primarily for debugging purposes, but it’s also possible to extract some meaningful contents (such as images).

Examples:
$ dumppdf.py -a foo.pdf
(dump all the headers and contents, except stream objects)

$ dumppdf.py -T foo.pdf
(dump the table of contents)

$ dumppdf.py -r -i6 foo.pdf > pic.jpeg
(extract a JPEG image)

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I am extracting a pdf into images / swf and text with the help

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply