Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 8642275
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 12, 20262026-06-12T11:44:02+00:00 2026-06-12T11:44:02+00:00

I have a String contating binary 0 inside in UTF-8 ( A\u0000B ). JAXB

  • 0

I have a String contating binary 0 inside in UTF-8 ("A\u0000B"). JAXB happily marshalls XML document containing such character but then fails to unmarshall it:

final JAXBContext jaxbContext = JAXBContext.newInstance(Root.class);
final Marshaller marshaller = jaxbContext.createMarshaller();
final Unmarshaller unmarshaller = jaxbContext.createUnmarshaller();

Root root = new Root();
root.value = "A\u0000B";

final ByteArrayOutputStream os = new ByteArrayOutputStream();
marshaller.marshal(root, os);

unmarshaller.unmarshal(new ByteArrayInputStream(os.toByteArray()));

The root class is just simple:

@XmlRootElement
class Root { @XmlValue String value; }

Output XML contains binary 0 as well between A and B (in hex: 41 00 42) which causes the following error during unmarshalling:

org.xml.sax.SAXParseException; lineNumber: 1; columnNumber: 63; 
An invalid XML character (Unicode: 0x0) was found in the element content of the document.

Interestingly using raw DOM API (example) produces escaped 0: A�B but trying to read it back yields similar error. Also 0 (neither binary nor escaped) is not allowed by any XML parser or xmllint (see also: Python + Expat: Error on � entities).

My questions:

  • why JAXB/DOM API allows creating invalid XML documents which it can not read back? Shouldn’t it fail fast during marshalling?

  • is there some elegant and global solution? I saw people tackling this problem by:

    • manually ignoring special characters from input

    • intercepting incoming stream or even

    • implementing some internal com.sun.xml.internal.bind.marshaller.CharacterEscapeHandler class

But shouldn’t mature XML stack in Java (I’m using 1.7.0_05) handle this either by default or by having some simple setting? I’m looking for escaping, ignoring or failing fast – but the default behavior of generating invalid XML is not acceptable. I believe such fundamental functionality should not require any extra coding on the client side.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-12T11:44:04+00:00Added an answer on June 12, 2026 at 11:44 am

    why JAXB/DOM API allows creating invalid XML documents which it can not read back? Shouldn’t it fail fast during marshalling?

    1. You would need to ask the implementors.

    2. It is possibly that they thought that the expense of checking every data character serialised was not justified … especially if the parser is then going to check them all over again.

    3. Having decided to implement the serializer this way (or having just done so by mistake), if they then changed the behaviour to do strict checking by default, they would break existing code that depends on being able to serialise illegal XML.

    But shouldn’t mature XML stack in Java (I’m using 1.7.0_05) handle this either by default or by having some simple setting?

    Not necessarily … if you accept the reason #2 above. Even a simple settings could have a measurable impact on performance.


    Also 0 (neither binary nor escaped) is not allowed by any XML parser or xmllint …

    Quite rightly so! It is forbidden by the XML spec.

    However, a more interesting test would be to see what happens when you try to generate XML containing an illegal character using other XML stacks.


    is there some elegant and global solution?

    If the problem you are trying to solve is how to send a \u0000 or \u000B, then you need to apply some application-specific encoding to the String before you insert it into the DOM. And the other end needs to deploy the equivalent decoding.

    If the problem you are trying to solve is how to detect the bad data before it is too late, you could do this with an output stream filter between the serializer and the final output stream. But if you detect the badness, there is no good (i.e. transparent to the XML consumer) way to fix it.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I have a string containing XML document using LinqtoXML What is the best way
I have a binary file. This file contains an UTF-8 string. Moreover, it is
If I have a 5 bit binary string such as '01010' , how can
After many calculations I have String containing binary representation of some data. How to
I have a string which contains binary data ( non-text data ). How do
I have a string which contains binary digits. How to separate string after each
I have a string containing the variable name. I want to get the value
I have this string containing a large chunk of html and am trying to
I have a string containing HTML and I need to replace some words to
I have a string containing something like this Hello bla bla bla bla ok,

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.