I have a string in the format:
t='@abc @def Hello this part is text'
I want to get this:
l=['abc', 'def'] s='Hello this part is text'
I did this:
a=t[t.find(' ',t.rfind('@')):].strip() s=t[:t.find(' ',t.rfind('@'))].strip() b=a.split('@') l=[i.strip() for i in b][1:]
It works for the most part, but it fails when the text part has the ‘@’. Eg, when:
t='@abc @def My email is red@hjk.com'
it fails. The @names are there in the beginning and there can be text after @names, which may possibly contain @.
Clearly I can append initally with a space and find out first word without ‘@’. But that doesn’t seem an elegant solution.
What is a pythonic way of solving this?
Building unashamedly on MrTopf’s effort:
prints:
Justly called to account by hasen j, let me clarify how this works:
matches a single tag – @ followed by at least one alphanumeric or _ followed by at least one space character. + is greedy, so if there is more than one space, it will grab them all.
To match any number of these tags, we need to add a plus (one or more things) to the pattern for tag; so we need to group it with parentheses:
which matches one-or-more tags, and, being greedy, matches all of them. However, those parentheses now fiddle around with our capture groups, so we undo that by making them into an anonymous group:
Finally, we make that into a capture group and add another to sweep up the rest:
A last breakdown to sum up:
Note that in reviewing this, I’ve improved it – \w didn’t need to be in a set, and it now allows for multiple spaces between tags. Thanks, hasen-j!