I am using Java 1.6.0
I am looking for the correct methods of encoding special HTML characters in Java
My HTML
<div id="sliceXML">Florida</div>
I am trying to define a String xmlMatch as the content (in this case Florida) of the above div tag by using the below Java. However i believe i do not have my startTag or endTag defined correctly.
My Java
String testContent = contentPara;
String startTag = "\"sliceXML\">";
String endTag = "</div";
String xmlMatch = null;
int startPosition = testContent.indexOf(startTag);
if(startPosition >1){
int subStringIndex = startPosition + startTag.length();
int endPosition = testContent.indexOf(endTag, subStringIndex);
if(endPosition >= startPosition){
xmlMatch = testContent.substring(subStringIndex, endPosition);
out.println(xmlMatch.length());
//out.println(startTag);
out.println("Florida".equals(xmlMatch));
out.println("florida".equals(xmlMatch));
}
}
Any help is much appricated. This would also allow me to answer a previous related question here
EDIT
WORK AROUND SOLUTION
As i explain below, i believe my issue was with the forward slash in String endTag = "</div"; To get past this problem i simply changed my end tag to String endTag = "<";
I still dont know why this happened, it would be great if someone could answer.
I would really use an HTML parser, such as the confusingly-named JTidy (it’s an HTML pretty-printer, but also gives you a DOM interface to the HTML structure).
It’ll save you from headaches such as the parsing and handling character entities and encoding.