Can anyone suggest an easy to implement way to extract ToUnicode tables from PDF? I can extract fonts using pdfextract from mupdf, now I’m looking for a way to extract ToUnicode tables for those fonts.
Can anyone suggest an easy to implement way to extract ToUnicode tables from PDF?
Share
You can modify pdfextract to extract the ToUnicode CMaps (not tables, CMaps).
You might look at the code in savefont and add something like :
If there is a ToUnicode (there need not be) then you could dump the stream in a similar way to the way the font stream is written to file.
buf->data (of size buf->len) would then contain the CMap, which you could write to file, or whatever.