Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 4036908
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 20, 20262026-05-20T12:16:53+00:00 2026-05-20T12:16:53+00:00

In my application i have to validate XML data and pickup all invalid characters

  • 0

In my application i have to validate XML data and pickup all invalid characters (put them in CDATA)

My question is quite simple… ^^ how to do it?

I started with Character.UnicodeBlock methods, but for characters incoded into several bytes – for example ‘ï’ or ‘é’, how does it works ?

This my code at the moment (to make tests):

public static void main(String[] args) {

try {
    byte[] data = "J'ai prïé et `".getBytes("UTF-8");

    System.out.print("Data: ");
    for (int i = 0; i < data.length; i++) {
    System.out.print((char) data[i]);
    }

    System.out.println("");

    UnicodeBlock myBlock = null;

    for (int i = 0; i < data.length; i++) {
    System.out.println("[" + i + " => '" + (char) data[i]
        + "'] Is defined: "
        + Character.isDefined(new Byte(data[i]).intValue()));
    try {
        myBlock = Character.UnicodeBlock.of(new Byte(data[i])
            .intValue());
    } catch (IllegalArgumentException e) {
        System.out
            .println("Count => "
                + Character.charCount(new Byte(data[i])
                    .intValue()));
    }
    }
} catch (UnsupportedEncodingException e) {
    System.err.println("Unsupported encoding: " + e.getMessage());
}
System.out.println("Finished");
}

And this is what i get at execution:

Data: J'ai pr???? et `
[0 => 'J'] Is defined: true
[1 => '''] Is defined: true
[2 => 'a'] Is defined: true
[3 => 'i'] Is defined: true
[4 => ' '] Is defined: true
[5 => 'p'] Is defined: true
[6 => 'r'] Is defined: true
[7 => '?'] Is defined: false
Count => 1
[8 => '?'] Is defined: false
Count => 1
[9 => '?'] Is defined: false
Count => 1
[10 => '?'] Is defined: false
Count => 1
[11 => ' '] Is defined: true
[12 => 'e'] Is defined: true
[13 => 't'] Is defined: true
[14 => ' '] Is defined: true
[15 => '`'] Is defined: true
Finished

I’m trying to find a way to also detect multiple byte characters, and only have ‘false’ result for real incorrect characters.

Maybe a library in Java already exists to do that?

Would be very kind if someone can help me.
Thanks in advance.

Regards.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-20T12:16:54+00:00Added an answer on May 20, 2026 at 12:16 pm

    A few things:

    • CDATA will not protect you from invalid characters; your junk data will still be illegal UTF-8 sequences and may be rejected by XML parsers
    • use a configured CharsetDecoder with an InputStreamReader to validate character sequences; alternatively, check byte sequences are valid by checking them as described in RFC 2279 (see the UTF-8 definition)
    • I wouldn’t try parsing XML without an XML parser
    • Character.isDefined expects a UTF-16BE encoded char (or a UTF-32BE encoded int), not UTF-8 encoded bytes
    • in Java 6, Character.isDefined is limited to code points defined in Unicode Standard, version 4.0.; there may be valid UTF-8 documents defined by later standards for which this will fail (version 6 is out now); the latest list of valid code points is defined in UnicodeData.txt
    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

In my application I have a DataGridView control that displays data for the selected
I have the following XML Parsing code in my application: public static XElement Parse(string
From my VB application I am generating XML document based on data fetched from
I need to add a form to my existing application i have it all
I have a very basic application that uses JAXB marshaller to validate input information
I have an game application I have written for Windows Mobile and I want
We have a Hibernate/Spring application that have the following Spring beans: <bean id=transactionManager class=org.springframework.orm.hibernate3.HibernateTransactionManager
In my application I have TextBox in a FormView bound to a LinqDataSource like
In my application I have a window which I popup with small messages on
In my current application I have a form that requires the user to enter

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.