I have some PDF files that I need to modify using a PHP script. I’m also able to exec() so I can use pretty much anything that runs on CentOS.
The PDF files when opened through Adobe Acrobat Pro X, show 2 layers in the “layers” panel:
- Background
- Color
When I disable both of these layers I end up with a black & white text & images (the text is not vector tho, it’s a scanned document).
I want to disable these layers and any other similar layer found in the PDFs using PHP and/or C# or any command-line tool.
Other useful information:
When I run pdfimages (provided with XPDF) on my PDFs, it extracts exactly what I actually need removed from each page…
Additional Information Update:
I modified the PDFSharp example here: http://www.pdfsharp.net/wiki/ExportImages-sample.ashx :
Modified:
Line 28: ExportImage(xObject, ref imageCount);
To:
PdfObject obj = xObject.Elements.GetObject("/OC");
Console.WriteLine(obj);
I got the following output in the console for each image:
<< /Name Background /Type /OCG >>
<< /OCGs [ 2234 0 R ] /P /AllOff /Type /OCMD >>
<< /Name Text Color /Type /OCG >>
Which is actually the layer information, and the PDFSharp Documentation for the /OC key:
Before the image is processed, its
visibility is determined based on this
entry. If it is determined to be
invisible, the entire image is
skipped, as if there were no Do
operator to invoke it.
So now, how do I modify the /OC value to something that will make these layers invisible?
After long hours of experimenting, I found the way! I’m posting the code so someone may find it helpful in the future: