Taking the BBC News RSS feed for example, one of their news items is as follows:
<item><title>Pupils 'bullied on sports field'</title><description>bla bla..
I have some java code parsing this – however, when a title contains an apostrophe (as above), the parsing stops, so I end up with the following title: Pupils ‘ and then it continues on and parses the description (which is fine). How do I get it to parse the full title? The following is a segment of code from inside my for loop where I parse the info:
NodeList title = element.getElementsByTagName("title");
Element line = (Element) title.item(0);
tmp.setTitle(getCharacterDataFromElement(line).toString());
The exact same code is used to parse the other elements like description and pubDate etc, which are all fine.
This is the getCharacterDataFromElement method:
public static String getCharacterDataFromElement(Element e) {
Node child = ((Node) e).getFirstChild();
if (child instanceof CharacterData) {
CharacterData cd = (CharacterData) child;
return cd.getData();
}
return "";
}
What am I doing wrong? I use the DocumentBuilder, DocumentBuilderFactory and org.w3c.dom to work with the RSS Feed.
As davidfrancis suggested, you should iterate over all children in
getCharacterDataFromElement().Alternatively, if you can use DOM level 3, you can use the Node.getTextContent() method instead which does what you want.