Here is my string:
String str = "<pre><font size="5"><strong><u>LVI . The Day of Battle</u></strong></font>
<font
size="4"><strong>";
I want to remove all html tags in a string with using StringTokenizer. But I don’t understand how to use StringTokenizer for this situation. Because when I use str.replaceAll("\\<.*?>",""), it is not efficient to remove all tags because some tags will be on the next line of string, as seen the string above. But I want to do it for all situations between < and >. How can I do it? (I want to achieve it using StringTokenizer). Thanks..
Trying to process HTML with regexes or
StringTokenizeralone is… painful.This answer is compulsory reading before you go any further.
If your HTML files are simple, you might get away with removing the newlines, then applying a regex, then reformatting the HTML – or try multiline regexes.
But you should really look at using a proper HTML parser. See this question (and probably many others…)