Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 6730251
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 26, 20262026-05-26T10:23:13+00:00 2026-05-26T10:23:13+00:00

What does it mean to say Java Modified UTF-8 Encoding ? How is it

  • 0

What does it mean to say “Java Modified UTF-8 Encoding” ? How is it different from normal UTF-8 Encoding?

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-26T10:23:13+00:00Added an answer on May 26, 2026 at 10:23 am

    This is described in detail in the javadoc of DataInput:

    Modified UTF-8

    Implementations of the DataInput and DataOutput interfaces represent Unicode strings in a format that is a slight modification of UTF-8. (For information regarding the standard UTF-8 format, see section 3.9 Unicode Encoding Forms of The Unicode Standard, Version 4.0). Note that in the following tables, the most significant bit appears in the far left-hand column.

    … (some tables, please click the javadoc link to see yourself) …

    The differences between this format and the standard UTF-8 format are the following:

    • The null byte '\u0000' is encoded in 2-byte format rather than 1-byte, so that the encoded strings never have embedded nulls.
    • Only the 1-byte, 2-byte, and 3-byte formats are used.
    • Supplementary characters are represented in the form of surrogate pairs.

    How to read it is described in detail in the javadoc of DataInput#readUTF():

    readUTF

    String readUTF()
               throws IOException
    

    Reads in a string that has been encoded using a modified UTF-8 format. The general contract of readUTF is that it reads a representation of a Unicode character string encoded in modified UTF-8 format; this string of characters is then returned as a String.

    First, two bytes are read and used to construct an unsigned 16-bit integer in exactly the manner of the readUnsignedShort method . This integer value is called the UTF length and specifies the number of additional bytes to be read. These bytes are then converted to characters by considering them in groups. The length of each group is computed from the value of the first byte of the group. The byte following a group, if any, is the first byte of the next group.

    If the first byte of a group matches the bit pattern 0xxxxxxx (where x means “may be 0 or 1“), then the group consists of just that byte. The byte is zero-extended to form a character.

    If the first byte of a group matches the bit pattern 110xxxxx, then the group consists of that byte a and a second byte b. If there is no byte b (because byte a was the last of the bytes to be read), or if byte b does not match the bit pattern 10xxxxxx, then a UTFDataFormatException is thrown. Otherwise, the group is converted to the character:

    (char)(((a& 0x1F) << 6) | (b & 0x3F))
    

    If the first byte of a group matches the bit pattern 1110xxxx, then the group consists of that byte a and two more bytes b and c. If there is no byte c (because byte a was one of the last two of the bytes to be read), or either byte b or byte c does not match the bit pattern 10xxxxxx, then a UTFDataFormatException is thrown. Otherwise, the group is converted to the character:

    (char)(((a & 0x0F) << 12) | ((b & 0x3F) << 6) | (c & 0x3F))
    

    If the first byte of a group matches the pattern 1111xxxx or the pattern 10xxxxxx, then a UTFDataFormatException is thrown.

    If end of file is encountered at any time during this entire process, then an EOFException is thrown.

    After every group has been converted to a character by this process, the characters are gathered, in the same order in which their corresponding groups were read from the input stream, to form a String, which is returned.

    The writeUTF method of interface DataOutput may be used to write data that is suitable for reading by this method.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

What does it mean to say - Engineering scalability into applications. Are there design
When you say thin data access layer, does this mainly mean you are talking
Lets say I have 2 individual java applications javaapp1 and javaapp2. from javaapp1, I
Possible Duplicate: What does it mean when you say C# is component oriented language?
Why does Java , running in -server mode, say that the version is mixed-mode
In Java, how does Unicode strings get compared? What I mean is, if I
What does it mean to say that a function (e.g. modular multiplication,sine) is implemented
Possible Duplicate: Java garbage collector - When does it collect? When people say that
I mean using and IDE of course. Does Java come with a toolbox like
Does that mean that I can't share a Form between delphi 2007 and 2009?

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.