Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 4036908
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 20, 20262026-05-20T12:16:53+00:00 2026-05-20T12:16:53+00:00

In my application i have to validate XML data and pickup all invalid characters

  • 0

In my application i have to validate XML data and pickup all invalid characters (put them in CDATA)

My question is quite simple… ^^ how to do it?

I started with Character.UnicodeBlock methods, but for characters incoded into several bytes – for example ‘ï’ or ‘é’, how does it works ?

This my code at the moment (to make tests):

public static void main(String[] args) {

try {
    byte[] data = "J'ai prïé et `".getBytes("UTF-8");

    System.out.print("Data: ");
    for (int i = 0; i < data.length; i++) {
    System.out.print((char) data[i]);
    }

    System.out.println("");

    UnicodeBlock myBlock = null;

    for (int i = 0; i < data.length; i++) {
    System.out.println("[" + i + " => '" + (char) data[i]
        + "'] Is defined: "
        + Character.isDefined(new Byte(data[i]).intValue()));
    try {
        myBlock = Character.UnicodeBlock.of(new Byte(data[i])
            .intValue());
    } catch (IllegalArgumentException e) {
        System.out
            .println("Count => "
                + Character.charCount(new Byte(data[i])
                    .intValue()));
    }
    }
} catch (UnsupportedEncodingException e) {
    System.err.println("Unsupported encoding: " + e.getMessage());
}
System.out.println("Finished");
}

And this is what i get at execution:

Data: J'ai pr???? et `
[0 => 'J'] Is defined: true
[1 => '''] Is defined: true
[2 => 'a'] Is defined: true
[3 => 'i'] Is defined: true
[4 => ' '] Is defined: true
[5 => 'p'] Is defined: true
[6 => 'r'] Is defined: true
[7 => '?'] Is defined: false
Count => 1
[8 => '?'] Is defined: false
Count => 1
[9 => '?'] Is defined: false
Count => 1
[10 => '?'] Is defined: false
Count => 1
[11 => ' '] Is defined: true
[12 => 'e'] Is defined: true
[13 => 't'] Is defined: true
[14 => ' '] Is defined: true
[15 => '`'] Is defined: true
Finished

I’m trying to find a way to also detect multiple byte characters, and only have ‘false’ result for real incorrect characters.

Maybe a library in Java already exists to do that?

Would be very kind if someone can help me.
Thanks in advance.

Regards.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-20T12:16:54+00:00Added an answer on May 20, 2026 at 12:16 pm

    A few things:

    • CDATA will not protect you from invalid characters; your junk data will still be illegal UTF-8 sequences and may be rejected by XML parsers
    • use a configured CharsetDecoder with an InputStreamReader to validate character sequences; alternatively, check byte sequences are valid by checking them as described in RFC 2279 (see the UTF-8 definition)
    • I wouldn’t try parsing XML without an XML parser
    • Character.isDefined expects a UTF-16BE encoded char (or a UTF-32BE encoded int), not UTF-8 encoded bytes
    • in Java 6, Character.isDefined is limited to code points defined in Unicode Standard, version 4.0.; there may be valid UTF-8 documents defined by later standards for which this will fail (version 6 is out now); the latest list of valid code points is defined in UnicodeData.txt
    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

Question Is it posible to have the Validation.Validate() method of the Validation Application Block
I have one question; In my ASP.NET MVC web application have to do certain
I have the following XML Parsing code in my application: public static XElement Parse(string
Someone else has already asked a somewhat similar question: Validate an Xml file against
What is the fastest method to retrieve an XML node? I have an application
I have a C# application that uses XML digital signatures to sign license files.
I have found several sources regarding how to validate an xml document against a
I have very simple persistance.xml file: <?xml version=1.0 encoding=UTF-8?> <persistence version=1.0 xmlns=http://java.sun.com/xml/ns/persistence xmlns:xsi=http://www.w3.org/2001/XMLSchema-instance xsi:schemaLocation=http://java.sun.com/xml/ns/persistence
I have a very basic application that uses JAXB marshaller to validate input information
I am working on an application in which, i have to read XML files

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.