I have to parse the content I get from the web and it can contain special characters. In this case the content string appears like the following:
<?xml version="1.0" encoding="UTF-8"?>
<products>
<product>
<id>1</id>
<price>2.14</price>
<title>test ž test</title>
When the contet above is passed to the method characters(), in the class which is extended from org.xml.sax.helpers.DefaultHandler:
public class ProductsXMLHandler extends DefaultHandler {
...
@Override
public void characters(char[] ch, int start, int length)
throws SAXException {
String elementValue = new String(ch, start, length);
...
}
I noticed the array test ž test is broken into three arrays: ‘test ‘, ‘ž‘ and ‘ test’
so the elementValue is not equal test ž test which should be the result. Does anyone know how to solve the problem?
Is it necessary to recode the source string:
<?xml version="1.0" encoding="UTF-8"?>
<products>
<product>
<id>1</id>
<price>2.14</price>
<title>test ž test</title>
before it is passed to XML handler class?
Thank you!
As Jon Skeet said in in answer,
charactersis called multiple times. What you should do is the following :startTag, create a StringBuffer, and note (in a boolean value for example) if you are in the right tag you are searching for.characters, if you are in the right tag (if the boolean set earlier is true), put the characters in the StringBufferendTag, if you are getting out of the right tag (see boolean, same thing as earlier), take the content of the StringBuffer and voilà ! Here is your complete string. Don’t forget to empty the StringBuffer after that.