I have a Java program which is supposed to remove all non-letter characters from a string, except when they are a smiley face such as =) or =] or 😛
It’s very easy to match the opposite with [a-zA-Z ]|=\)|=\]|:P but I cannot figure out how to negate this expression. Since I am using the String.replaceAll() function it must be in the negated form.
I believe part of the issue may come from the fact that smiles are generally 2 characters long, and I am only matching 1 character at a time?
Interestingly, replaceAll("(?![Tt])[Oo]","") removes every occurrence of the letter O, even in the word “to.” Does this mean my replaceAll function does not understand regex lookahead? It doesn’t throw any errors…
I ended up using
replaceAll("(?<![=:;])[\\]\\[\\(\\)\\/]","")
.replaceAll("[=:;](?![\\]\\[\\(\\)o0OpPxX\\/])","")
.replaceAll("[^a-zA-Z=:;\\(\\)\\[\\]\\/ ]","")
which is extremely messy but works perfectly. The... quick! (brown) fox jump's over the[] lazy dog. :] =O ;X becomes THE QUICK BROWN FOX JUMPS OVER THE LAZY DOG :] =O ;X
Edit: Ignore that fix, see the accepted answer below.
It should be pretty easy to due this using a negative lookahead. Basically the match will fail at any position where the regex inside of the
(?!...)group matches. You should follow the negative lookahead with a single wildcard (.) to consume a character if the lookahead did not match (meaning that the next character is a non-letter character that is not part of a smiley face).edit: Clearly I hadn’t tested my original regex very thoroughly, you also need a negative lookbehind following the
.to make sure that the character you consumed was not the second character in a smiley:Note that you might be able to shorten the regex by using character classes for the eyes and the mouth, for example: