Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 1001085
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 16, 20262026-05-16T07:36:48+00:00 2026-05-16T07:36:48+00:00

Here’s an excerpt from java.text.CharacterIterator documentation: This interface defines a protocol for bidirectional iteration

  • 0

Here’s an excerpt from java.text.CharacterIterator documentation:

  • This interface defines a protocol for bidirectional iteration over text. The iterator iterates over a bounded sequence of characters. […] The methods previous() and next() are used for iteration. They return DONE if […], signaling that the iterator has reached the end of the sequence.

  • static final char DONE: Constant that is returned when the iterator has reached either the end or the beginning of the text. The value is \uFFFF, the "not a character" value which should not occur in any valid Unicode string.

The italicized part is what I’m having trouble understanding, because from my tests, it looks like a Java String can most certainly contain \uFFFF, and there doesn’t seem to be any problem with it, except obviously with the prescribed CharacterIterator traversal idiom that breaks because of a false positive (e.g. next() returns '\uFFFF' == DONE when it’s not really "done").

Here’s a snippet to illustrate the "problem" (see also on ideone.com):

import java.text.*;
public class CharacterIteratorTest {

    // this is the prescribed traversal idiom from the documentation
    public static void traverseForward(CharacterIterator iter) {
       for(char c = iter.first(); c != CharacterIterator.DONE; c = iter.next()) {
          System.out.print(c);
       }
    }

    public static void main(String[] args) {
        String s = "abc\uFFFFdef";

        System.out.println(s);
        // abc?def

        System.out.println(s.indexOf('\uFFFF'));
        // 3
        
        traverseForward(new StringCharacterIterator(s));
        // abc
    }
}

So what is going on here?

  • Is the prescribed traversal idiom "broken" because it makes the wrong assumption about \uFFFF?
  • Is the StringCharacterIterator implementation "broken" because it doesn’t e.g. throw an IllegalArgumentException if in fact \uFFFF is forbidden in valid Unicode strings?
  • Is it actually true that valid Unicode strings should not contain \uFFFF?
  • If that’s true, then is Java "broken" for violating the Unicode specification by (for the most parts) allowing String to contain \uFFFF anyway?
  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-16T07:36:49+00:00Added an answer on May 16, 2026 at 7:36 am

    EDIT (2013-12-17): Peter O. brings up an excellent point below, which renders this answer wrong. Old answer below, for historical accuracy.


    Answering your questions:

    Is the prescribed traversal idiom “broken” because it makes the wrong assumption about \uFFFF?

    No. U+FFFF is a so-called non-character. From Section 16.7 of the Unicode Standard:

    Noncharacters are code points that are permanently reserved in the Unicode Standard for internal use. They are forbidden for use in open interchange of Unicode text data.

    …

    The Unicode Standard sets aside 66 noncharacter code points. The last two code points of
    each plane are noncharacters: U+FFFE and U+FFFF on the BMP, U+1FFFE and U+1FFFF
    on Plane 1, and so on, up to U+10FFFE and U+10FFFF on Plane 16, for a total of 34 code
    points. In addition, there is a contiguous range of another 32 noncharacter code points in
    the BMP: U+FDD0..U+FDEF.

    Is the StringCharacterIterator implementation “broken” because it doesn’t e.g. throw an IllegalArgumentException if in fact \uFFFF is forbidden in valid Unicode strings?

    Not quite. Applications are allowed to use those code points internally in any way they want. Quoting the standard again:

    Applications are free to use any of these noncharacter code points internally but should
    never attempt to exchange them. If a noncharacter is received in open interchange, an
    application is not required to interpret it in any way. It is good practice, however, to recognize it as a noncharacter and to take appropriate action, such as replacing it with U+FFFD REPLACEMENT CHARACTER, to indicate the problem in the text. It is not recommended to
    simply delete noncharacter code points from such text, because of the potential security
    issues caused by deleting uninterpreted characters.

    So while you should never encounter such a string from the user, another application or a file, you may well put it into a Java String if you know what you’re doing (this basically means that you cannot use the CharacterIterator on that string, though.

    Is it actually true that valid Unicode strings should not contain \uFFFF?

    As quoted above, any string used for interchange must not contain them. Within your application you’re free to use them in whatever way they want.

    Of course, a Java char, being just a 16-bit unsigned integer doesn’t really care about the value it holds as well.

    If that’s true, then is Java “broken” for violating the Unicode specification by (for the most parts) allowing String to contain \uFFFF anyway?

    No. In fact, the section on noncharacters even suggests the use of U+FFFF as sentinel value:

    In effect, noncharacters can be thought of as application-internal private-use code points.
    Unlike the private-use characters discussed in Section 16.5, Private-Use Characters, which
    are assigned characters and which are intended for use in open interchange, subject to
    interpretation by private agreement, noncharacters are permanently reserved (unassigned)
    and have no interpretation whatsoever outside of their possible application-internal private
    uses.

    U+FFFF and U+10FFFF. These two noncharacter code points have the attribute of being
    associated with the largest code unit values for particular Unicode encoding forms. In
    UTF-16, U+FFFF is associated with the largest 16-bit code unit value, FFFF16. U+10FFFF is
    associated with the largest legal UTF-32 32-bit code unit value, 10FFFF16. This attribute
    renders these two noncharacter code points useful for internal purposes as sentinels. For
    example, they might be used to indicate the end of a list, to represent a value in an index
    guaranteed to be higher than any valid character value, and so on.

    CharacterIterator follows this in that it returns U+FFFF when no more characters are available. Of course, this means that if you have another use for that code point in your application you may consider using a different non-character for that purpose since U+FFFF is already taken – at least if you’re using CharacterIterator.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Ask A Question

Stats

  • Questions 500k
  • Answers 500k
  • Best Answers 0
  • User 1
  • Popular
  • Answers
  • Editorial Team

    How to approach applying for a job at a company ...

    • 7 Answers
  • Editorial Team

    How to handle personal stress caused by utterly incompetent and ...

    • 5 Answers
  • Editorial Team

    What is a programmer’s life like?

    • 5 Answers
  • Editorial Team
    Editorial Team added an answer True random numbers can only be generated "outside" a computer,… May 16, 2026 at 1:56 pm
  • Editorial Team
    Editorial Team added an answer Solved it. If anyone have the same issue here is… May 16, 2026 at 1:56 pm
  • Editorial Team
    Editorial Team added an answer This works for me: class foo { var $myvar =… May 16, 2026 at 1:56 pm

Trending Tags

analytics british company computer developers django employee employer english facebook french google interview javascript language life php programmer programs salary

Top Members

Related Questions

I'm trying to decode HTML entries from here NYTimes.com and I cannot figure out
Here, i have coded to get data from DB. I want to store the
I'm looking for suggestions for debugging... If you view this site in Firefox or
Here's a basic regex technique that I've never managed to remember. Let's say I'm
Here's what I'm trying do to in a single SQL Server procedure: @ID1 int
Here is a link my example of the misaligned table rows Click preview in
Here’s a problem I’ve really been struggling with. I need to merge two sorted
Here is a snippet of my form <form action= method=post onsubmit=return verify()> <input type=submit
Here's the top of my usercontrol: <UserControl x:Class=MyApp.Common.Controls.Views.SimpleView xmlns=http://schemas.microsoft.com/winfx/2006/xaml/presentation xmlns:x=http://schemas.microsoft.com/winfx/2006/xaml xmlns:mc=http://schemas.openxmlformats.org/markup-compatibility/2006 xmlns:d=http://schemas.microsoft.com/expression/blend/2008 xmlns:Framework=http://www.memoryexpress.com/UIFramework mc:Ignorable=d
Here is a shell script: echo Starting Jarvis Program D. ALICE_HOME=. SERVLET_LIB=lib/servlet.jar ALICE_LIB=lib/aliceserver.jar JS_LIB=lib/js.jar

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.