Can anyone suggest an easy to implement way to extract ToUnicode tables from PDF?

Question

0

Asked: May 26, 20262026-05-26T05:06:14+00:00 2026-05-26T05:06:14+00:00

Can anyone suggest an easy to implement way to extract ToUnicode tables from PDF?

0

Can anyone suggest an easy to implement way to extract ToUnicode tables from PDF? I can extract fonts using pdfextract from mupdf, now I’m looking for a way to extract ToUnicode tables for those fonts.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-26T05:06:14+00:00

You can modify pdfextract to extract the ToUnicode CMaps (not tables, CMaps).

You might look at the code in savefont and add something like :

obj = fz_dict_gets(dict, "ToUnicode");
if (obj)
{
    stream = obj;
}

If there is a ToUnicode (there need not be) then you could dump the stream in a similar way to the way the font stream is written to file.

obj = fz_dict_gets(dict, "ToUnicode");
if (obj)
{
    stream = obj;
        buf = fz_new_buffer(0);

        error = pdf_load_stream(&buf, xref, fz_to_num(stream), fz_to_gen(stream));
        if (error)
        die(error);
            /* Do something with the data */
    }

buf->data (of size buf->len) would then contain the CMap, which you could write to file, or whatever.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

Can anyone suggest an easy to implement way to extract ToUnicode tables from PDF?

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply