Is there a way i can get the content of a pdf file (“example.pdf”) into an IText object like Paragraph or a Chunk?
I need to use the content in a new pdf i am generating (among other text).
Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.
Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.
Lost your password? Please enter your email address. You will receive a link and will create a new password via email.
Please briefly explain why you feel this question should be reported.
Please briefly explain why you feel this answer should be reported.
Please briefly explain why you feel this user should be reported.
No, at least not easily.
When iText puts Chunks and Paragraphs and all such objects into a PDF (or other PDF creating programs their respective objects), the information of “the words from here to there form a paragraph” or “these words form a chapter” is generally lost. Instead all there remains are multiple positioned letter groups. (Ok, there can be more information, but mostly there isn’t.)
What you can do, though, is parse the content of a PDF using the classes e.g. in the iText parser package to retrieve those positioned letter groups and apply some heuristics to them to guess which of them form a paragraph, or a chapter, or whatever.