I have the following string and I would like to extract the elements (xx=”yy”) and what’s between the brackets. Here’s an example:
this too please
I’ve tried the following code but I’m quite a noob with regex.
re.sub(r'\(.*)\[\/caption\]', "tokens: %1 %2 %3 %4 %5", self.content, re.IGNORECASE)
Thanks a lot in advance!
It’s probably not working for you because
.*is greedy. Try[^"]*in its place.[^"]means the set of all characters except the quote character. Also, as you’ve pointed out in the comments, the token syntax, is\\n, not%n. Try this:Do the contents of the caption tag span multiple lines? If they do
.*won’t capture the newlines. You’ll need to us something like[^\x00]*instead.[^\x00]means the set of all charchters except the null character.On the off chance that your strings can actually legitimately contain null characters, you would need to use the
re.DOTALLflag instead.