I am not able to search the content if font type is Type0 in the PDF. After PDF parsing it is returning some garbage string to me. In Type0 font i am not able to scan the PDF content also (CGPdfContentStreamRef). Anyone having some prior knowledge on this please help me out.
In Apple developer document i saw that Apple will support only 3 types of PDF fonts.
- kCGFontPostScriptFormatType1 = 1,
- kCGFontPostScriptFormatType3 = 3,
- kCGFontPostScriptFormatType42 = 42
(reference CGFontReference)
Is it true?
A Type0 font references glyphs directly (and not characters) avoiding the usage of a cmap entirely. Also note that text in a type0 font uses two bytes/octets per glyph id like so:
Will render glyph 1, 2, and then glyph 3.
In the font PDF dictionary there is usually some ToUnicode reference to a stream that contains a mapping of converting the glyph id back to some Unicode characters. This is a small text document that is fairly simple to parse.
given the glyph IDs in the text emitting commands and the ToUnicode stream you can derive the Unicode string that generates the same data.
If there is no ToUnicode… you’re on your own. Perhaps the embedded font contains a cmap (unlikely, this is usually stripped to conserve space) fro which you can derive the information. but this is probably too far fetched.