Possible Duplicate:
Removing HTML from a Java String
I am having a problem removing htmls tags from a text file in java. I know it would be easy to use something like
str=str.toString().replaceAll("\\<.*?>","");
However I want to know if I could split the string and go throught and replace everything srarting from < to > with “”.
I tried
String [] str= "<tag>with some string </tag>";
String s="";
for (i=0; i < str.length; i++)
{
if (str[i].toString()=="<")
{
str[i]="";
}
else if (str[i].toString()==">")
{
s=s+str[i+1];
}
}
when i try printing the new string s, it just prints out with just white space.
thanks for the help
You need to some flag variable denoting you are inside of tag and add the third situation when you are not in the tag, so the rest of content gets added to string. For example: