I am using libpoppler to parse PDF file to plain text,and I want to output page header,page footer and content separately,how can I do this??
Is there any structure or class that hold them?
Thanks in advance!!
Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.
Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.
Lost your password? Please enter your email address. You will receive a link and will create a new password via email.
Please briefly explain why you feel this question should be reported.
Please briefly explain why you feel this answer should be reported.
Please briefly explain why you feel this user should be reported.
You can get text in a page with
poppler_page_get_text(). Can you parse pure text afterwards? Here is a sample code. It’s not a C++ but hope you can see the idea.Tested on a Debian Unstable amd64, libpoppler-glib-dev 0.18.4-3, gcc 4.7.1-7
$ gcc -Wall -g -Wextra get-text.c $(pkg-config --cflags --libs poppler-glib)