In our website, some Mac users have troubles when they copy-paste text from PDF

Question

0

Asked: June 15, 20262026-06-15T05:23:22+00:00 2026-06-15T05:23:22+00:00

In our website, some Mac users have troubles when they copy-paste text from PDF

0

In our website, some Mac users have troubles when they copy-paste text from PDF files into a TextArea (handled by TinyMCE). All accentuated char are corrupted, and became for example e? for a é, i? for a î, etc. I cannot reproduce this problem with a Windows computer.

When I wrote the content of the TextArea on a file (before inserting it in the database), I just discovered that the initial é is visually different that a traditionnal é (on Vim, see below).

Visual example of the problem

Indeed :

// the corrupted é - first line of the screenshot
echo bin2hex($char); // display 65cc81

// traditionnal é
echo bin2hex('é');   // display c3a9

After searching a lot, here I am :
It seems that Mac OS copies Unicode accentuated chars as a combination of two chars: in our example, e + ́. So far, I didn’t find any solution to replace corrupted é with the real one, to avoid e? in the database.

And I’m a little desperate.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-15T05:23:23+00:00

The process of normalizing the representation to one form or the other is called, well, normalization. In PHP there’s the Normalizer class for that, sending all input through it is a good idea:

$input = Normalizer::normalize($input);

You likely want to normalize to form C, Canonical Decomposition followed by Canonical Composition.

Should that class not be available on your system, there’s the Patchwork UTF-8 library.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

In our website, some Mac users have troubles when they copy-paste text from PDF

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply