I’m downloading a CSV from Google Docs and in it characters like are saved

Question

0

Asked: June 15, 20262026-06-15T19:54:00+00:00 2026-06-15T19:54:00+00:00

I’m downloading a CSV from Google Docs and in it characters like are saved

0

I’m downloading a CSV from Google Docs and in it characters like “ are saved as \xE2\x80\x9C and ” are saved as \xE2\x80\x9D.

My question is… what charset are those being saved in? How might I go about figuring that out?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-15T19:54:01+00:00

It is in UTF-8.. You can tell by decoding it as UTF-8 and it shows the correct characters.

UTF-8 also has a unique and very distinctive pattern, just 3 bytes with highest bit set forming a valid UTF-8 sequence are enough to tell if something is UTF-8 with 99% confidence. Even with 2 bytes with highest bit set forming a valid UTF-8 sequence, you can already get to 90%.

In a case it wasn’t UTF-8, and was some 8-bit code page instead, it would be impossible to tell just by looking at the bytes alone. Without any other information, you would basically have to brute force by decoding it in various 8-bit encodings and then seeing if it looks correct. The other possibility is using an algorithm that would go through the encodings automatically, and see if it the result makes sense in any language.

With more information like what operating system and locale the file was saved in, you could reduce the amount of possible encodings to try by a huge deal though.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I’m downloading a CSV from Google Docs and in it characters like are saved

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply