I’m trying to create a query that filters tweets by the @ or # tags.
So I just want the results for either @Obama or #Obama but not Obama. This is what I have so far:
re.compile(r'\b(?:#|@|)*%s*\b' % re.escape(obama), re.IGNORECASE)
Thanks for the replies….I tries both answers and what seems to work in my situation is:
re.compile(r'\b[#@]*%s\b' % re.escape(term), re.IGNORECASE)
‘term’ is an element in a list which I iterate over. This then returns tweets that has either a # or @ pre-pended to the ‘term’. Itried not using ‘*’ but It was giving out exceptions.
Thanks
Try using this regular expression:
Character class
[%@]works faster then choice group(?:#|@).So, we begin with word boundary
\b, then follows#or@. Then goes substitute fromobamavariable. Then goes the trailing boundary.In the question you used
*quantifiers which repeat the previous expression from 0 to infinity times. There is no reason to repeat#and@symbols. Also, the last sybmol ofobamashouldn’t be repeated either.