Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 6552191
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 25, 20262026-05-25T12:25:47+00:00 2026-05-25T12:25:47+00:00

In the Java and C# implementation of String , is the underlying information a

  • 0

In the Java and C# implementation of String, is the underlying information a null-terminated char array like in C/C++?

(In addition to other information like size, etc.)

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-25T12:25:48+00:00Added an answer on May 25, 2026 at 12:25 pm

    No. It is a sequence of UTF-16 code units and a length. Java and C# strings can contain embedded NULs.

    Each UTF-16 code-unit occupies two bytes, so you can think of the string "\n\0\n" as:

    {
      length: 3,  // 3 pairs of bytes == 3 UTF-16 code units
      bytes:  [0, 10, // \n
               0, 0,  // \0
               0, 10] // \n
    }
    

    Note that the last byte in bytes is not 0. The length field tells how many of the bytes are used. This allows substring to be very efficient — reuse the same byte array, but with a different length (and offset if your VM implementation can’t point into an array).

    UTF-16 (16-bit Unicode Transformation Format) is a character encoding for Unicode capable of encoding 1,112,064 numbers (called code points) in the Unicode code space from 0 to 0x10FFFF. It produces a variable-length result of either one or two 16-bit code units per code point.

    From javadoc

    A String represents a string in the UTF-16 format in which supplementary characters are represented by surrogate pairs (see the section Unicode Character Representations in the Character class for more information). Index values refer to char code units, so a supplementary character uses two positions in a String.

    C# System.String is defined similarly

    Each Unicode character in a string is defined by a Unicode scalar value, also called a Unicode code point or the ordinal (numeric) value of the Unicode character. Each code point is encoded using UTF-16 encoding, and the numeric value of each element of the encoding is represented by a Char. The resulting collection of Char objects constitutes the String.

    I’m not sure whether C# guards against orphaned surrogates, but the above text seems to mix the terms "scalar value" and "codepoint" which is confusing. A scalar value is defined thus by unicode.org:

    Any Unicode code point except high-surrogate and low-surrogate code points

    Java definitely takes the codepoint view, and does not attempt to guard against invalid scalar values in strings.

    "Strings Immutability and Persistence" explains the efficiency benefits of this representation.

    One of the benefits of the immutable data types I’ve talked about here previously is that they are not just immutable, they are also "persistent". By "persistent", I mean an immutable data type such that common operations on that type (like adding a new item to a queue, or removing an item from a tree) can re-use most or all of the memory of an existing data structure. Since it is all immutable, you can re-use its parts without worrying about them changing on you.

    EDIT:
    The above is true conceptually and in practice, but VMs and CLRs have freedom to do things differently in certain situations.

    The Java language specification mandates that strings are laid out a certain way in .class files, and its JNI jstring type abstracts away in-memory representation details so a VM could, in theory, represent a string in memory as a NUL-terminated UTF-8 string with a two-byte form used for embedded NUL characters instead of the int32 length and uint16[] bytes representation that allows for efficient random access to code-units.

    VMs don’t do this in practice though. "The Most Expensive One-byte Mistake" argues that NUL-terminated strings were a huge mistake in C, so I doubt VMs will adopt them internally for efficiency reasons.

    The best candidate I have been able to come up with is the C/Unix/Posix use of NUL-terminated text strings. The choice was really simple: Should the C language represent strings as an address + length tuple or just as the address with a magic character (NUL) marking the end?

    …

    Thinking a bit about virtual memory systems settles that question for us. Optimizing the movement of a known-length string of bytes can take advantage of the full width of memory buses and cache lines, without ever touching a memory location that is not part of the source or destination string.

    One example is FreeBSD’s libc, where the bcopy(3)/memcpy(3) implementation will move as much data as possible in chunks of "unsigned long," typically 32 or 64 bits, and then "mop up any trailing bytes" as the comment describes it, with byte-wide operations.2

    If the source string is NUL terminated, however, attempting to access it in units larger than bytes risks attempting to read characters after the NUL. If the NUL character is the last byte of a [virtual memory] page and the next [virtual memory] page is not defined, this would cause the process to die from an unwarranted "page not present" fault.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

Is there an openID implementation in Java? I would like to use this in
Is there a way to get the string length in twips ? java implementation
I have written my own implementation of java.utils.List. Now I'd like to test it,
What is the equivalent Java implementation for string.format ? string s = string.Format({0} -
I have heard that the Java implementation of Generics is not as good as
I am trying to make a Java implementation of the Park-Miller-Carta PRNG random number
The Situation: I'm optimizing a pure-java implementation of the LZF compression algorithm, which involves
I need a good OLEDate java implementation, and this one does not seem to
The (very outdated) page for LZO contains a link to a Java implementation. There
My understanding is that Java's implementation of regular expressions is based on Perl's. However,

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.