I have a twelve-year-old Windows program. As may be obvious to the knowledgeable, it

Question

0

Asked: May 18, 20262026-05-18T00:35:59+00:00 2026-05-18T00:35:59+00:00

I have a twelve-year-old Windows program. As may be obvious to the knowledgeable, it

0

I have a twelve-year-old Windows program. As may be obvious to the knowledgeable, it was designed for ASCII characters, not Unicode. Most of it has been converted, but there’s one spot that still needs to be changed over. There is a serious constraint on it though: the exact same ~~ASCII~~ byte sequence MUST be created by different encoders, some of which will be operating on non-Windows systems.

I’m trying to determine whether UTF-8 will do the trick or not. I’ve heard in passing that different UTF-8 sequences can come up with the same Unicode string, which would be a problem here.

So the question is: given a Unicode string, can I expect a single canonical UTF-8 sequence to be generated by any standards-conforming implementation of a converter? Or are there multiple possibilities?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-18T00:36:00+00:00

Any given Unicode string will have only one representation in UTF-8.

I think the confusion here is that there are multiple ways in Unicode to get the same visual output for some languages. Not to mention that Unicode has several characters that have no visual representation.

But this has nothing to do with UTF-8, its a property of Unicode itself. The encoding of a given Unicode as UTF-8 is a purely mechanical process, and it’s perfectly reversible.

The conversion rules are here:
http://en.wikipedia.org/wiki/UTF-8

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I have a twelve-year-old Windows program. As may be obvious to the knowledgeable, it

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply