Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 740881
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 14, 20262026-05-14T08:34:03+00:00 2026-05-14T08:34:03+00:00

I have a String created from a byte[] array, using UTF-8 encoding. However, it

  • 0

I have a String created from a byte[] array, using UTF-8 encoding.
However, it should have been created using another encoding (Windows-1252).

Is there a way to convert this String back to the right encoding?

I know it’s easy to do if you have access to the original byte array, but it my case it’s too late because it’s given by a closed source library.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-14T08:34:04+00:00Added an answer on May 14, 2026 at 8:34 am

    As there seems to be some confusion on whether this is possible or not I think I’ll need to provide an extensive example.

    The question claims that the (initial) input is a byte[] that contains Windows-1252 encoded data. I’ll call that byte[] ib (for "initial bytes").

    For this example I’ll choose the German word "Bär" (meaning bear) as the input:

    byte[] ib = new byte[] { (byte) 0x42, (byte) 0xE4, (byte) 0x72 };
    String correctString = new String(ib, "Windows-1252");
    assert correctString.charAt(1) == '\u00E4'; //verify that the character was correctly decoded.
    

    (If your JVM doesn’t support that encoding, then you can use ISO-8859-1 instead, because those three letters (and most others) are at the same position in those two encodings).

    The question goes on to state that some other code (that is outside of our influence) already converted that byte[] to a String using the UTF-8 encoding (I’ll call that String is for "input String"). That String is the only input that is available to achieve our goal (if ib were available, it would be trivial):

    String is = new String(ib, "UTF-8");
    System.out.println(is);
    

    This obviously produces the incorrect output "B�".

    The goal would be to produce ib (or the correct decoding of that byte[]) with only is available.

    Now some people claim that getting the UTF-8 encoded bytes from that is will return an array with the same values as the initial array:

    byte[] utf8Again = is.getBytes("UTF-8");
    

    But that returns the UTF-8 encoding of the two characters B and � and definitely returns the wrong result when re-interpreted as Windows-1252:

    System.out.println(new String(utf8Again, "Windows-1252");
    

    This line produces the output "B�", which is totally wrong (it is also the same output that would be the result if the initial array contained the non-word "Bür" instead).

    So in this case you can’t undo the operation, because some information was lost.

    There are in fact cases where such mis-encodings can be undone. It’s more likely to work, when all possible (or at least occuring) byte sequences are valid in that encoding. Since UTF-8 has several byte sequences that are simply not valid values, you will have problems.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I have a UTF-8 string (created an std::string from a byte array) I understand
I have created a function that will return the string path from my DB.
I have created a function which gets an encoded string (possibly UTF-16 not sure)
Say I have a structure like: class SomeObject Public Name as String Public Created
I have created a servlet that passes a string variable strname to a JSP
I want to store data in string form to MySQL. I have created the
I have a string $string = 'S:1,M:1,L:1,XL:1,XXL:1,3XL:1'; I want to create an array where
i have read the thread Google Goggles API . and From NotAnotherCodeBlog have created
How do I convert a structure to a byte array in C#? I have
I read file from user using FileReference. I have a variable which I want

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.