I have a HTML string:
<span class=thisword>anh</span><br />
-grand frère</span><br />
-cousin (fils d'un grand frère ou d'une grande soeur du père ou de la mère)</span><br />
-(nom générique désignant un homme encore jeune)</span><br />
I want to get the strings in it.
I have done the following:
Elements ed=docu.getElementsByTag("span");
for(Element e: ed)
{
System.out.println(removeHTML(e.toString()));
// removeHTML is method remove tags in HTML receive
}
It only displays the string
anh
I want it to display
anh -grand frère -cousin (fils d'un grand frère ou d'une grande soeur du père ou de la mère) -(nom générique désignant un homme encore jeune)
but I haven’t had any success. Can you help me?
The HTML isn’t valid… you can’t expect much anything from that.
Your program is probably outputting four strings, one with text, the other three empty. Your HTML is being interpreted like this (at least to most browsers):
You’re not opening all these other spans that you’re supposedly closing. You need to start a new span before each line of text if you want it all to be included: