Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 7049747
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 28, 20262026-05-28T03:03:33+00:00 2026-05-28T03:03:33+00:00

java.nio.charset.Charset.forName(utf8).decode decodes a byte sequence of ED A0 80 ED B0 80 into the

  • 0

java.nio.charset.Charset.forName(“utf8”).decode decodes a byte sequence of

 ED A0 80 ED B0 80

into the Unicode codepoint:

 U+10000

java.nio.charset.Charset.forName(“utf8”).decode also decodes a byte sequence of

 F0 90 80 80

into the Unicode codepoint:

 U+10000

This is verified by the code below.

Now this seems to be telling me that the UTF-8 encoding scheme will decode ED A0 80 ED B0 80 and F0 90 80 80 into the same unicode codepoint.

However, if I visit https://www.google.com/search?query=%ED%A0%80%ED%B0%80,

I can see that it is clearly different from the page https://www.google.com/search?query=%F0%90%80%80

Since the Google Search is using UTF-8 encoding scheme (correct me if I’m wrong) as well,

This suggests that the UTF-8 does not decode ED A0 80 ED B0 80 and F0 90 80 80 into the same unicode codepoint(s).

So basically I was wondering, by the official standard, should UTF-8 decode ED A0 80 ED B0 80 byte sequence into the Unicode codepoint U+10000 ?

Code:

public class Test {

    public static void main(String args[]) {
        java.nio.ByteBuffer bb = java.nio.ByteBuffer.wrap(new byte[] { (byte) 0xED, (byte) 0xA0, (byte) 0x80, (byte) 0xED, (byte) 0xB0, (byte) 0x80 });
        java.nio.CharBuffer cb = java.nio.charset.Charset.forName("utf8").decode(bb);
        for (int x = 0, xx = cb.limit(); x < xx; ++x) {
            System.out.println(Integer.toHexString(cb.get(x)));
        }
        System.out.println();
        bb = java.nio.ByteBuffer.wrap(new byte[] { (byte) 0xF0, (byte) 0x90, (byte) 0x80, (byte) 0x80 });
        cb = java.nio.charset.Charset.forName("utf8").decode(bb);
        for (int x = 0, xx = cb.limit(); x < xx; ++x) {
            System.out.println(Integer.toHexString(cb.get(x)));
        }
    }
}
  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-28T03:03:34+00:00Added an answer on May 28, 2026 at 3:03 am

    ED A0 80 ED B0 80 is the UTF-8 encoding of the UTF-16 surrogate pair D800 DC00. This is NOT allowed in UTF-8:

    However, pairs of UCS-2 values between D800 and DFFF (surrogate pairs
    in Unicode parlance)…need special treatment: the UTF-16
    transformation must be undone
    , yielding a UCS-4 character that is then
    transformed as above.

    However, such an encoding is used in CESU-8 and Java’s “Modified UTF-8”.

    Since the Google Search is using UTF-8 encoding scheme (correct me if I’m wrong) as well,

    It appears, based on the search box, that Google is using some kind of encoding auto-detection. If you pass it F0 90 80 80, which is valid UTF-8, it interprets it as UTF-8 (). If you pass it ED A0 80 ED B0 80, which is invalid UTF-8, it interprets it as windows-1252 (í�€í°€).

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

Why this line doesn't work? import static java.nio.file.AccessMode.*; Eclipse says: The import java.nio.file cannot
I need to put the contents of a java.nio.ByteBuffer into an java.io.OutputStream . (wish
I use the following code to read data. It throws java.nio.charset.MalformedInputException. The file I
I'm really wondering what this code does: scala> import java.nio.file._ import java.nio.file._ scala> Files.copy(Paths.get(),
I am trying to read a UTF8 string via a java.nio.ByteBuffer. The size is
My Java NIO Selector is implemented using select() so it blocks until any of
The release notes for Java NIO (in Java 1.4+) state that support for direct
From what I read about Java NIO and non-blocking [Server]SocketChannels, it should be possible
Hi I am trying to implements a simple Java NIO server; which registers the
At what point is it better to switch from java.net to java.nio? .net (not

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.