I am using JSOUP (java tool for XML files) and I am using following

Question

0

Asked: May 26, 20262026-05-26T15:35:18+00:00 2026-05-26T15:35:18+00:00

I am using JSOUP (java tool for XML files) and I am using following

0

I am using JSOUP (java tool for XML files) and I am using following code to read an URL that is saved in a XML file. here are my codes:

Document d = Jsoup.parse(new File("feed.xml"), null);
Element elementCat = d.getElementsByTag("cat").get(0);
String stringUrl = elementCat.ownText();
System.out.println(stringUrl);

the XML input file is like this:

<?xml version="1.0" encoding="utf-8" standalone="no"?>
<root>
<cat>http://www.isna.ir/ISNA/FullNews.aspx?SrvID=Event&Lang=P</cat>
</root>

my problem is that the output of program is this:
http://www.isna.ir/ISNA/FullNews.aspx?SrvID=Event⟪=P
instead of this:
http://www.isna.ir/ISNA/FullNews.aspx?SrvID=Event&Lang=P

In other words, it converts “&Lang” to “⟪” automatically.
Please pay attention that it is not “⟪”, it’s just “&Lang” without semicolon.
I want to disable encoding or escaping and I want the raw data.

How can I solve this problem?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-26T15:35:18+00:00

You’ve got a piece of XML. In XML, there’s a manner of escaping markup, since sometimes you just need a piece of text containing < or an attribute with " in its value. Escaping is done using a character entity reference, which starts with an ampersand, followed by a code, followed by a semi-colon. Like so: <. That can represent <.

Of course, that leaves us with the problem of the ampsersand itself. If it’s actually an ampersand you need, rather than some different character entity, you’ll have to encode it thus: &.

What you’ve got there is XML that isn’t well-formed. The & indicates you’re starting a character entity reference, but then it gets Lang. Now, maybe jsoup doesn’t make much of a problem of this. But that’s because it’s for HTML parsing and not XML. Since HTML is a bit more lenient than XML, I suppose jsoup simply subtitutes what it takes to be an unknown character reference with something else. Likely a nul character.

So make sure the XML is well-formed. If that can’t be done, don’t treat it as XML but as HTML. If XML processing is what you’re after, look into SAX, StAX, DOM or JAXB.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I am using JSOUP (java tool for XML files) and I am using following

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply