I have a PDF file with valuable textual information.
The problem is that I cannot extract the text, all I get is a bunch of garbled symbols. The same happens if I copy and paste the text from the PDF reader to a text file. Even File -> Save as text in Acrobat Reader fails.
I have used all tools I could get my hands on and the result is the same. I believe that this has something to do with fonts embedding, but I don’t know what exactly?
My questions:
- What is the culprit of this weird text garbling?
- How to extract the text content from the PDF (programmatically, with a tool, manipulating the bits directly, etc.)?
- How to fix the PDF to not garble on copy?
I went to a lot of people for help and OCR is the only solution to this problem