I have a string name s,
String s = "<NOUN>Sam</NOUN> , a student of the University of oxford , won the Ethugalpura International Rating Chess Tournament which concluded on Dec.22 at the Blue Olympiad Hotel";
I want to remove all <NOUN> and </NOUN> tags from the string. I used this to remove tags,
s.replaceAll("[<NOUN>,</NOUN>]","");
Yes it removes the tag. but it also removes letter ‘U’ and ‘O’ characters from the string which gives me following output.
Sam , a student of the niversity of oxford , won the Ethugalpura International Rating Chess Tournament which concluded on Dec.22 at the Blue lympiad Hotel
Can anyone please tell me how to do this correctly?
Try:
In RegEx, the syntax
[...]will match every character inside the brackets, regardless of the order they appear in. Therefore, in your example, all appearances of “<“, “N”, “O” etc. are removed. Instead use the pipe (|) to match both “<NOUN>” and “</NOUN>”.The following should also work (and could be considered more DRY and elegant) since it will match the tag both with and without the forward slash: