I’m trying to add tags to some given query strings, and the tags should wrap around all the matching strings.
For example, I want to wrap tags around all the words that match the query iphone games mac in the sentence I love downloading iPhone games from my mac. It should be I love downloading <em>iPhone games</em> from my <em>mac</em>.
Currently, I tried
sentence = "I love downloading iPhone games from my mac."
query = r'((iphone|games|mac)\s*)+'
regex = re.compile(query, re.I)
sentence = regex.sub(r'<em>\1</em> ', sentence)
The sentence outputs
I love downloading <em>games </em> on my <em>mac</em> !
Where \1 is only replace by one word (games instead of iPhone games) and there are some unnecessary spaces after the word. How do I write the regular expression to get the desired output? Thanks!
Edit:
I just realized that both Fred and Chris’s solutions have problems when I have words within words. For instance, if my query is game, then it will turn out to be <em>game</em>s while I want it not be highlighted. Another example is the in either shouldn’t be highlighted.
Edit 2:
I took Chris’ new solution and it works.
First of all, to get the spaces as you want them, replace
\s*with\s*?to make it non-greedy.First fix:
Unfortunately, once the
\s*is non-greedy, it splits phrases, as you can see. Without it, it goes like this, grouping the two together:I can’t think yet how to fix this.
Note also that in these I have stuck in an extra set of brackets around the + so that all matches get caught – that’s the difference.
Further update: actually, I can think of a way to get around it. You decide whether you want it like that.
Update: taking your point about word boundaries into account, we only need to add in a few instances of
\b, the word boundary matcher.