When I copy/paste text from most sites and pdfs, the following characters are almost

Question

0

Editorial Team

Asked: May 20, 20262026-05-20T19:57:42+00:00 2026-05-20T19:57:42+00:00

When I copy/paste text from most sites and pdfs, the following characters are almost

0

When I copy/paste text from most sites and pdfs, the following characters are almost always in the unicode equivalent:

double quote: ” is “ and ” (“ and ”)
single quote: ‘ is ‘ and ’ (‘ and ’)
ellipsis: … is … (…)

I understand ones that can’t be represented without unicode like © and ¢, but even for those, I wonder.

When should you use these unicode equivalents? Are they more semantic than not using them? Are they better interpreted by devices (copy/paste/print)? I always find it annoying getting those quote and ellipsis characters because with textmate + programming, you don’t use them.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-20T19:57:43+00:00

I think there’s a simple explanation: MS Word converts these characters/sequences automatically as you type and a lot of text in the internet has been copied from this text editor.

Most of the articles I get for my site from other authors are sent as .doc file and I have to convert it. Usually, it contains these characters you’ve mentioned.

I’d also add one more: many different types of dashes instead of the hyphen. And also the low opening double quote (as seen in some european languages).

I usually let them stay in the text (all my pages are unicode). It’s just important to remember it when playing around with regex etc (especially the dashes can be tricky and hard to spot).

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

When I copy/paste text from most sites and pdfs, the following characters are almost

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply