I’ve an ASCII file that contains an EM Dash (— or — in HTML).

Question

0

Asked: May 11, 20262026-05-11T11:18:58+00:00 2026-05-11T11:18:58+00:00

I’ve an ASCII file that contains an EM Dash (— or — in HTML).

0

I’ve an ASCII file that contains an EM Dash (— or — in HTML). The hex value is 0x97. When we pass this file through one application it arrives as UTF-8, and it converts the character to 0xC297, which is  in HTML. However, when we pass this file through a different application it converts the character to 0xE28094 or —.

What would cause these applications to convert these characters differently? Is it perhaps a code page setting?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

score 0 · Answer 1 · 2026-05-11T11:18:58+00:00

 is wrong. When you use numeric character references, the number refers to the Unicode codepoint. For numbers below 256 that is the same as the codepoint in ISO-8859-1. In 8859-1, character 151 is amongst the “C1 control codes”, and not a dash or any other visible character.

The confusion arises because character 151 is a dash in Windows code page 1252 (Western European). Many people think cp1252 is the same thing as ISO-8859-1, but in reality it’s not: the characters in the C1 range (128 to 159) are different.

The first application is reading your “ASCII” file* as ISO-8859-1, but actually it’s probably cp1252 and you’ll need a way to clue the app in about what encoding it has to expect.

(*: “ASCII” is a misnomer if there are top-bit-set characters in the file. You probably mean “ANSI”, which is really also a misnomer, but one which has stuck in the Windows world to mean “text encoded in the current system-default code page”.)

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I’ve an ASCII file that contains an EM Dash (— or &mdash; in HTML).

Leave an answerCancel reply

1 Answer

I’ve an ASCII file that contains an EM Dash (— or — in HTML).

Leave an answer
Cancel reply