I have a pdf document containing several images. I want to retrieve names of

Question

0

Editorial Team

Asked: June 11, 20262026-06-11T00:32:37+00:00 2026-06-11T00:32:37+00:00

I have a pdf document containing several images. I want to retrieve names of

0

I have a pdf document containing several images.

I want to retrieve names of these images.

How to achieve this using either iText or pdfbox?

I know that ExtractImages extracts images from PDF. I feel that this will somewhere have the functionality to fetch name of the image. However, I don’t know the usage of ExtractImages.

The actual problem to fetch names of PDF is to use it to compress these images to reduce the size of the pdf. Is my approach correct?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-11T00:32:38+00:00

What you can get with pdfbox is the key of the image and its suffix (type). You can also save that image.

    String prefix = new File(pdfFilename).getName();
    prefix = prefix.substring(0, prefix.indexOf(".pdf"));

    PDDocument document = null;

    try
    {
        document = PDDocument.loadNonSeq(new(pdfFilename), null); // use non-seq parser is better

        List<PDPage> pages = document.getDocumentCatalog().getAllPages();
        System.out.println(pdfFilename + ": Total pages: " + pages.size());
        int p = 0;
        for (PDPage page : pages)
        {
            ++p;
            PDResources resources = page.getResources();
            Map<String, PDXObjectImage> imageResources = resources.getImages();
            for (String key : imageResources.keySet())
            {
                PDXObjectImage objectImage = imageResources.get(key);
                System.out.printf("image key '%s': %d x %d, type %s%n", key, objectImage.getHeight(), objectImage.getWidth(), objectImage.getSuffix());

                // write that image
                String fname = String.format("%s-%04d-%s", prefix, p, key);
                objectImage.write2file(fname);
            }
        }
    }
    // put catch here
    document.close();

However this won’t help you unless you are sure that all these images were converted directly to PDF, i.e. without rotation, translation or scaling. If you need this, then you might want to have a look at the PrintImageLocations.java example in the PDFBOX src download.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I have a pdf document containing several images. I want to retrieve names of

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply