Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • Home
  • SEARCH
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 585143
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 13, 20262026-05-13T14:58:20+00:00 2026-05-13T14:58:20+00:00

EDIT: I’ve been convinced that this question is somewhat non-sensical. Thanks to those who

  • 0

EDIT: I’ve been convinced that this question is somewhat non-sensical. Thanks to those who responded. I may post a follow-up question that is more specific.

Today I was investing some encoding problems and wrote this unit test to isolate a base repro case:

int badCount = 0;
for (int i = 1; i < 255; i++) {
    String str = "Hi " + new String(new char[] { (char) i });

    String toLatin1  = new String(str.getBytes("UTF-8"), "latin1");
    assertEquals(str, new String(toLatin1.getBytes("latin1"), "UTF-8"));

    String toWin1252 = new String(str.getBytes("UTF-8"), "Windows-1252");
    String fromWin1252 = new String(toWin1252.getBytes("Windows-1252"), "UTF-8");

    if (!str.equals(fromWin1252)) {
        System.out.println("Can't encode: " + i + " - " + str + 
                           " - encodes as: " + fromWin1252);
        badCount++;
    }
}

System.out.println("Bad count: " + badCount);

The output:

    Can't encode: 129 - Hi ? - encodes as: Hi ??
    Can't encode: 141 - Hi ? - encodes as: Hi ??
    Can't encode: 143 - Hi ? - encodes as: Hi ??
    Can't encode: 144 - Hi ? - encodes as: Hi ??
    Can't encode: 157 - Hi ? - encodes as: Hi ??
    Can't encode: 193 - Hi Á - encodes as: Hi ??
    Can't encode: 205 - Hi Í - encodes as: Hi ??
    Can't encode: 207 - Hi Ï - encodes as: Hi ??
    Can't encode: 208 - Hi ? - encodes as: Hi ??
    Can't encode: 221 - Hi ? - encodes as: Hi ??
    Bad count: 10

JDK 1.6.0_07 on Mac OS 10.6.2

My observation:

Latin1 symmetrically encodes all 254 characters. Windows-1252 does not. The three printable characters (193, 205, 207) are the same codes in Latin1 and Windows-1252, so I wouldn’t expect any issues.

Can anyone explain this behavior? Is this a JDK bug?

— James

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-13T14:58:21+00:00Added an answer on May 13, 2026 at 2:58 pm

    In my opinion the testing program is deeply flawed, because it makes effectively useless transformations between Strings with no semantic meaning.

    If you want to check if all byte values are valid values for a given encoding, then something like this might be more like it:

    public static void tryEncoding(final String encoding) throws UnsupportedEncodingException {
        int badCount = 0;
        for (int i = 1; i < 255; i++) {
            byte[] bytes = new byte[] { (byte) i };
    
            String toString = new String(bytes, encoding);
            byte[] fromString = toString.getBytes(encoding);
    
            if (!Arrays.equals(bytes, fromString)) {
                System.out.println("Can't encode: " + i + " - in: " + Arrays.toString(bytes) + "/ out: "
                        + Arrays.toString(fromString) + " - result: " + toString);
                badCount++;
            }
        }
    
        System.out.println("Bad count: " + badCount);
    }
    

    Note that this testing program tests inputs using the (usnigned) byte values from 1 to 255. The code in the question uses the char values (equivalent to Unicode codepoints in this range) from 1 to 255.

    Try printing the actual byte arrays handled by the program in the example and you see that you’re not actually checking all byte values and that some of your “bad” matches are duplicates of others.

    Running this with "Windows-1252" as the argument produces this output:

    Can't encode: 129 - in: [-127]/ out: [63] - result: �
    Can't encode: 141 - in: [-115]/ out: [63] - result: �
    Can't encode: 143 - in: [-113]/ out: [63] - result: �
    Can't encode: 144 - in: [-112]/ out: [63] - result: �
    Can't encode: 157 - in: [-99]/ out: [63] - result: �
    Bad count: 5
    

    Which tells us that Windows-1252 doesn’t accept the byte values 129, 1441, 143, 144 and 157 as valid values. (Note: I’m talking about unsigned byte values here. The code above shows -127, -115, … because Java only knows unsigned bytes).

    The Wikipedia article on Windows-1252 seems to verify this observation by stating this:

    According to the information on Microsoft’s and the Unicode Consortium’s websites, positions 81, 8D, 8F, 90, and 9D are unused

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

Edit: This question has been moved to Getting an error from Xcode that is
EDIT 07/14 As Bill Burgess mentionned in a comment of his answer, this question
EDIT : It turned out that this can only be done through an external
EDIT : I've gotten the famous question badge with this question, so I figured
Edit: This question was written in 2008, which was like 3 internet ages ago.
Edit: From another question I provided an answer that has links to a lot
EDIT: This question is more about language engineering than C++ itself. I used C++
Edit: The below question was answered by this . I have a new updated
Edit: I'm looking for solution for this question now also with other programming languages.
edit: change of question if my code is like this: <form name=login action=https://login.extremebb.net/login method=post

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.