I have a Java batch job which prints 1 million (1 page) PDF document.
This batch job will run after every 5 days.
For printing 1 million (1 Page) PDF document through batch job, which method is better ?
In this PDF most of the text / paragraph is same for all customers, only few information is dynamically picked from database as (Customer Id/ Name/ Due Date/ Expiry Date/ Amount)
We have tried following
1) Jasper Report
2) iText
But above 2 methods are not giving better performance as static text / paragraph for each document is created runtime always.
So I am thinking for some approach like
There will be a template with place holders for dynamic values (Customer Id/ Name/ Due Date/ Expiry Date/ Amount).
There will be a Communication Server like Open Office, which will have this template.
Through our Java Application deployed on web server will fetch dataset from database and pass onto this communication server, where templates are already opened into memory and just from dataset dynamic placeholder values will be changed and template will be saved like “Save As” command.
Can this above approach will be achievable, If yes which API / Communication server is better ?
Here is Jasper Report Code for reference
InputStream is = getClass().getResourceAsStream("/jasperreports/reports/"+reportName+".jasper" );
JasperPrint print = JasperFillManager.fillReport(is, parameters, dataSource);
pdf = File.createTempFile("report.pdf", "");
JasperExportManager.exportReportToPdfFile(print, pdf.getPath());
As a couple of posters mentioned, 1 million PDF files is going to mean you are going to have to sustain a rate of over 2 documents per second. This is achievable from a pure document-generation aspect, but you need to keep in mind that the load on the systems running the queries and compiling the data will also undergo a reasonable load. You also haven’t said anything about the PDFs – a one page PDF is much easier to generate than a 40 page PDF…
I have seen iText and Docmosis achieve tens of documents per second and so Jasper and other technologies probably could also. I mention Docmosis because it works along the lines of the technique you mentioned (populating templates loaded into memory). Please note I work for the company that produces Docmosis.
If you haven’t already, you will need to consider the hardware/software architecture and run trials with whatever technologies you are trying to make sure you will be able to get the performance you require. Presumably the peak-load might be somewhat higher than the average load.
Good luck.