Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 6939783
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 27, 20262026-05-27T12:40:01+00:00 2026-05-27T12:40:01+00:00

Does anyone know if the standard Java library (any version) provides a means of

  • 0

Does anyone know if the standard Java library (any version) provides a means of calculating the length of the binary encoding of a string (specifically UTF-8 in this case) without actually generating the encoded output? In other words, I’m looking for an efficient equivalent of this:

"some really long string".getBytes("UTF-8").length

I need to calculate a length prefix for potentially long serialized messages.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-27T12:40:01+00:00Added an answer on May 27, 2026 at 12:40 pm

    Here’s an implementation based on the UTF-8 specification:

    public class Utf8LenCounter {
      public static int length(CharSequence sequence) {
        int count = 0;
        for (int i = 0, len = sequence.length(); i < len; i++) {
          char ch = sequence.charAt(i);
          if (ch <= 0x7F) {
            count++;
          } else if (ch <= 0x7FF) {
            count += 2;
          } else if (Character.isHighSurrogate(ch)) {
            count += 4;
            ++i;
          } else {
            count += 3;
          }
        }
        return count;
      }
    }
    

    This implementation is not tolerant of malformed strings.

    Here’s a JUnit 4 test for verification:

    public class LenCounterTest {
      @Test public void testUtf8Len() {
        Charset utf8 = Charset.forName("UTF-8");
        AllCodepointsIterator iterator = new AllCodepointsIterator();
        while (iterator.hasNext()) {
          String test = new String(Character.toChars(iterator.next()));
          Assert.assertEquals(test.getBytes(utf8).length,
                              Utf8LenCounter.length(test));
        }
      }
    
      private static class AllCodepointsIterator {
        private static final int MAX = 0x10FFFF; //see http://unicode.org/glossary/
        private static final int SURROGATE_FIRST = 0xD800;
        private static final int SURROGATE_LAST = 0xDFFF;
        private int codepoint = 0;
        public boolean hasNext() { return codepoint < MAX; }
        public int next() {
          int ret = codepoint;
          codepoint = next(codepoint);
          return ret;
        }
        private int next(int codepoint) {
          while (codepoint++ < MAX) {
            if (codepoint == SURROGATE_FIRST) { codepoint = SURROGATE_LAST + 1; }
            if (!Character.isDefined(codepoint)) { continue; }
            return codepoint;
          }
          return MAX;
        }
      }
    }
    

    Please excuse the compact formatting.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

Does anyone know of any 'standard' way to interface with a telephony system (think
Does anyone know of any best practices or 'standard' techniques for implementing authentication between
Does anyone know of any standard algorithms to determine an affine transformation matrix based
Does anyone know of any platforms supported by the C standard, for which there
Does anyone know of a library (preferably java) that can give me neighboring keys
Does anyone know the implementation details for the standard java priority queue? heap? skiplist?
Does anyone know a standard way to keep alive the http session as long
Does anyone know how to implement the standard bubble message that warns users whenever
Does anyone know if there's a de-facto standard (i.e., TR1 or Boost) C++ function
Does anyone know of a good wrapper for the Windows ADSI libraries for Java?

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.