I’m trying to remove all punctuation and anything inside brackets or parentheses from a string in python. The idea is to somewhat normalize song names to get better results when I query the MusicBrainz WebService.
Sample input: T.N.T. (live) [nyc]
Expected output: T N T
I can do it in two regexes, but I would like to see if it can be done in just one. I tried the following, which didn’t work…
>>> re.sub(r'\[.*?\]|\(.*?\)|\W+', ' ', 'T.N.T. (live) [nyc]')
'T N T live nyc '
If I split the \W+ into it’s own regex and run it second, I get the expected result, so it seems that \W+ is eating the braces and parens before the first two options can deal with them.
You are correct that the
\W+is eating the braces, remove the+and you should be set: