Looking to find the appropriate regular expression for the following conditions:
I need to clean certain tags within free flowing text. For example, within the text I have two important tags: <2004:04:12> and <name of person>. Unfortunately some of tags have missing “<” or “>” delimiter.
For example, some are as follows:
1) <2004:04:12 , I need this to be <2004:04:12>
2) 2004:04:12>, I need this to be <2004:04:12>
3) <John Doe , I need this to be <John Doe>
I attempted to use the following for situation 1:
String regex = "<\\d{4}-\\d{2}-\\d{2}\\w*{2}[^>]";
String output = content.replaceAll(regex,"$0>");
This did find all instances of “<2004:04:12” and the result was “<2004:04:12 >”.
However, I need to eliminate the space prior to the ending tag.
Not sure this is the best way. Any suggestions.
Thanks
Basically, you are looking for a negative look-ahead, like this:
This will help with the numeric “tags”, but since no regex can be intelligent enough to match an arbitrary name, you either must define very closely what a name can look like, or deal with the fact that the same approach is impossible for “name” tags.