I’m using Apache PDFBox to scan through a PDF in search of links to a certain file.
I’ve got about a thousand PDF’s to scan, and most of the links (in fact all but one as far as I can see now) are found.
However, there is one particular link in a PDF that PDFBox simply ignores. If I open the PDF with Foxit and check the link’s properties, it looks exactly like all the other links (that do get found).
Here’s the code I use to iterate through the links:
for( Object p : pages ) {
PDPage page = (PDPage)p;
List<?> annotations = page.getAnnotations();
for( Object a : annotations ) {
PDAnnotation annotation = (PDAnnotation)a;
if( annotation instanceof PDAnnotationLink ) {
PDAnnotationLink link = (PDAnnotationLink)annotation;
/* Do stuff with the link */
}
}
}
In the affected PDF, page.getAnnotations() does return an empty list.
Is there any other type of link besides the annotations that I should be aware of?
I took a look at the annot dictionary. It looks like this:
I can’t see anything wrong with it. It is also referenced correctly from the Annots entry in the page. Sorry I cannot be of more help.