Is there any free Java library for extracting text from PDF, that is compatible with Google Application Engine?
I’ve read about PDFJet, but it can’t read PDF, can it?
Is there perhaps other way how to extract text from PDF? I tried http://www.pdfdownload.org/, unfortunately they don’t handle non-English characters correctly.
iText now has a text parsing module (I’m one of the parser authors). See the com.itextpdf.text.pdf.parser.PdfContentReaderTool class for an example of how to use it.