I am using BIRT to generate PDF reports that contain graphs of data and a table of the data. I have TestNG unit tests that create a BIRT PDF and then I want to compare the created PDF with a baseline report. I can’t use an MD5 hash because each report is timestamped in the footer and the timestamps always changes. I tried using PDPage and PDResources to get all the images from the PDF but the graphs don’t seem to be images b/c the call to getImages from the PDResources object returns 0 images. Using PDFBox what are other elements of the PDF that I can grab and compare with the baseline PDF to verify equality? The format of the PDF is as follows, page 1 will contain a Title, a start date/time label, an end date/time label, a report note, followed by one or more graphs, followed by 1 table.
Share
The solution I used was to parse the BIRT PDF report using PDFBox but removing a string of pre-defined text that contained the footer and compared the two ArrayList.
pdfDoc = PDDocument.load(docname);List pages = pdfDoc.getDocumentCatalog().getAllPages();
Iterator iter = pages.iterator();
while(iter.hasNext())
{
PDPage page = (PDPage) iter.next();
String unparsedLine = page.getContents().getInputStreamAsString();
documentStreamList.add(unparsedLine.replaceAll(DATE_FOOTER_FORMAT, ""));
}
This code will return a list of strings contained by the PDF and compare it to the baseline BIRT report. All data will get parsed equally and the reports match when they are the same. I never found a way to compare the generated graphs directly.