Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • Home
  • SEARCH
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 8173595
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 6, 20262026-06-06T22:13:05+00:00 2026-06-06T22:13:05+00:00

I have string that displays UTF-8 encoded characters, and I want to convert it

  • 0

I have string that displays UTF-8 encoded characters, and I want to convert it back to Unicode.

For now, my implementation is the following:

public static string DecodeFromUtf8(this string utf8String)
{
    // read the string as UTF-8 bytes.
    byte[] encodedBytes = Encoding.UTF8.GetBytes(utf8String);

    // convert them into unicode bytes.
    byte[] unicodeBytes = Encoding.Convert(Encoding.UTF8, Encoding.Unicode, encodedBytes);

    // builds the converted string.
    return Encoding.Unicode.GetString(encodedBytes);
}

I am playing with the word "déjà". I have converted it into UTF-8 through this online tool, and so I started to test my method with the string "déjÃ".

Unfortunately, with this implementation the string just remains the same.

Where am I wrong?

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-06T22:13:06+00:00Added an answer on June 6, 2026 at 10:13 pm

    So the issue is that UTF-8 code unit values have been stored as a sequence of 16-bit code units in a C# string. You simply need to verify that each code unit is within the range of a byte, copy those values into bytes, and then convert the new UTF-8 byte sequence into UTF-16.

    public static string DecodeFromUtf8(this string utf8String)
    {
        // copy the string as UTF-8 bytes.
        byte[] utf8Bytes = new byte[utf8String.Length];
        for (int i=0;i<utf8String.Length;++i) {
            //Debug.Assert( 0 <= utf8String[i] && utf8String[i] <= 255, "the char must be in byte's range");
            utf8Bytes[i] = (byte)utf8String[i];
        }
    
        return Encoding.UTF8.GetString(utf8Bytes,0,utf8Bytes.Length);
    }
    
    DecodeFromUtf8("d\u00C3\u00A9j\u00C3\u00A0"); // déjà
    

    This is easy, however it would be best to find the root cause; the location where someone is copying UTF-8 code units into 16 bit code units. The likely culprit is somebody converting bytes into a C# string using the wrong encoding. E.g. Encoding.Default.GetString(utf8Bytes, 0, utf8Bytes.Length).


    Alternatively, if you’re sure you know the incorrect encoding which was used to produce the string, and that incorrect encoding transformation was lossless (usually the case if the incorrect encoding is a single byte encoding), then you can simply do the inverse encoding step to get the original UTF-8 data, and then you can do the correct conversion from UTF-8 bytes:

    public static string UndoEncodingMistake(string mangledString, Encoding mistake, Encoding correction)
    {
        // the inverse of `mistake.GetString(originalBytes);`
        byte[] originalBytes = mistake.GetBytes(mangledString);
        return correction.GetString(originalBytes);
    }
    
    UndoEncodingMistake("d\u00C3\u00A9j\u00C3\u00A0", Encoding(1252), Encoding.UTF8);
    
    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I have a string of that displays like this.... 1235, 3, 1343, 5, 1234,
I have a string that looks like this: <name>-<gender>-<age>.jpg I want to be very
Many sequences of encoded Unicode characters have the same visual representation and the same
I have string that look like Array that fetched from other webservice like this
I have a String that holds the string representation of a byte value. String
I have a string that contains HTML image elements that is stored in a
I have a String that looks like this String = Förpackning Flaska (375 ml)
I have a string that says 15:00:00 how can I limit the length of
I have a string that i need to split where there are 2 or
I have an existing application that displays Google Analytics data. Currently, it stores the

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.