I am wanting to learn how to generate a PDF, I don’t want to use any third party tools, I want to create it myself in code. The only things I have seen so far as examples is code I have looked at by opening up reflector on a 3rd party dll, to see what is happening. Unfortunately the dll’s I have seen so far seem to be hitting user32.dll and gdi32.dll, to help creating the pdf document, my issue is I have no idea what they are doing and more importantly why ?
Does anyone have any good tutorials or references, which may point me in the right direction.
Thanks in advance.
The spec is the ultimate guide. Here is what you will ultimately have to do:
The header is easy – it defines that the file is PDF and the version.
Objects data types in PDF. This includes bool, number, string, list/array, dictionary and stream.
Objects are either written directly or indirectly.
Direct objects are written as is.
Indirect objects are written like this:
For example, I could write:
And whenever I want to use that string elsewhere, I just have to use an indirect reference, which is defined as:
in this case, I could refer to my string as:
To quickly find an object, there is a cross reference table that tells where an object of a particular id and generation lives in the file.
So, in addition to simply writing objects to the file, you have to keep track of the file position where indirect objects have been defined.
All of this is doable, but you’re going to quickly find that as you write these files that it’s going to become really challenging to make changes in your output stream and keep things neat and tidy. What’s worse, is that other people have done this too, so now there are a pile of garbage PDFs out in the wild that Acrobat manages to cope with somehow. For example, GhostScript (hopefully this is fixed), produced PDFs whose cross-reference tables were complete garbage – they pointed at nothing useful. Then there are producers that out and out violate the spec by using the wrong data type for dictionary entries or others that have spec-required information missing.
It’s fairly nightmarish to consume PDF.
Still, it’s an interesting exercise, but if you want to do anything significant, you need to start writing good tools that manage all the indirect references for you and the cross reference tables and dictionaries and type checking and so on and so forth. In the end, you’ll find that maybe an existing library would serve you better.
And being the author of tools that consume and generate PDF, I will plead that you don’t let any of your non-compliant PDFs out into the wild.