I want to match words that contain special characters or that begin with ‘http://’
So this sentence
%he#llo, my website is: http://www.url.com/abcdef123
should turn into this
my website
So far, i have this
re.sub(r"^[^\w]", " ", "%he#llo, my website is: http://www.url.com/abcdef123")
This just removes the symbols, but it doesn’t remove the words associated with the symbol (it also doesn’t remove ‘:’ and ‘,’), nor does it remove the URL.
For the example string you give, the following regular expression works OK:
… or you can remove those words with
re.subThe
|means alternation and will match the expression on either side within the group. The part on the left matcheshttp://followed by one or more non-space characters. The part on the right matches zero or more non-space characters, followed by anything that isn’t a word or space character, followed by zero or more non-space characters — that ensures that you have a string with at least one non-word character and no spaces.Updated: Of course, as the other answers implicitly suggest, since the
http://prefix contains a non-word character (/) you don’t need to have that as an alternative – you could simplify the regular expression to\S*[^\w\s]\S*. However, perhaps the example above with alternation is still useful.