I need to post several (read: a lot) PDF files to the web but many of them have hard coded file:// links and links to non-public locations. I need to read through these PDFs and update the links to the proper locations. I’ve started writing an app using itextsharp to read through the directories and files, find the PDFs and iterate through each page. What I need to do next is find the links and then update the incorrect ones.
string path = "c:\\html";
DirectoryInfo rootFolder = new DirectoryInfo(path);
foreach (DirectoryInfo di in rootFolder.GetDirectories())
{
// get pdf
foreach (FileInfo pdf in di.GetFiles("*.pdf"))
{
string contents = string.Empty;
Document doc = new Document();
PdfReader reader = new PdfReader(pdf.FullName);
using (MemoryStream ms = new MemoryStream())
{
PdfWriter writer = PdfWriter.GetInstance(doc, ms);
doc.Open();
for (int p = 1; p <= reader.NumberOfPages; p++)
{
byte[] bt = reader.GetPageContent(p);
}
}
}
}
Quite frankly, once I get the page content I’m rather lost on this when it comes to iTextSharp. I’ve read through the itextsharp examples on sourceforge, but really didn’t find what I was looking for.
Any help would be greatly appreciated.
Thanks.
This one is a little complicated if you don’t know the internals of the PDF format and iText/iTextSharp’s abstraction/implementation of it. You need to understand how to use
PdfDictionaryobjects and look things up by theirPdfNamekey. Once you get that you can read through the official PDF spec and poke around a document pretty easily. If you do care I’ve included the relevant parts of the PDF spec in parenthesis where applicable.Anyways, a link within a PDF is stored as an annotation (
PDF Ref 12.5). Annotations are page-based so you need to first get each page’s annotation array individually. There’s a bunch of different possible types of annotations so you need to check each one’sSUBTYPEand see if its set toLINK(12.5.6.5). Every link should have anACTIONdictionary associated with it (12.6.2) and you want to check the action’sSkey to see what type of action it is. There’s a bunch of possible ones for this, link’s specifically could be internal links or open file links or play sound links or something else (12.6.4.1). You are looking only for links that are of typeURI(note the letterIand not the letterL). URI Actions (12.6.4.7) have aURIkey that holds the actual address to navigate to. (There’s also anIsMapproperty for image maps that I can’t actually imagine anyone using.)Whew. Still reading? Below is a full working VS 2010 C# WinForms app based on my post here targeting iTextSharp 5.1.1.0. This code does two main things: 1) Create a sample PDF with a link in it pointing to Google.com and 2) replaces that link with a link to bing.com. The code should be pretty well commented but feel free to ask any questions that you might have.
EDIT
I should note, this only changes the actual link. Any text within the document won’t get updated. Annotations are drawn on top of text but aren’t really tied to the text underneath in anyway. That’s another topic completely.