So I am trying to incorporate a hashtag search into posts on my application. I have come up with a few close Regular Expressions, but none seem to work. Let’s say I have a string:
#i keep #hashtags in my mom's attic. She says I am her number #1 son. Why I'm not num#ber #2: http://pics.com/pic#thepic
I would want the RegEx to match:
-
i
-
hashtags
-
1
-
2
I want it to specifically NOT match HTML entities, #anchors in URLs, hashtags that start in the middle of a word, and any trailing punctuation after a hashtag. I also want to make sure it matches a hashtag at the beginning of the string, as well as hashtags that are wholly numeric. I want it to allow hyphens and underscores in hashtags as well. This is for a MySQL query, so Lookaheads and Lookbacks won’t work.
The current one I’m stuck on is:
#([A-Za-z0-9_]+)
But it’s not really doing the job. RegEx is an area in which my knowledge is a bit lacking. Thanks for any help!
The following matches all the examples you want matched, and rejects all the ones you don’t want matched:
You don’t have to match the whole hashtag, since SQL won’t return that string anyway, it just returns 1 (match) or 0 (no match). To extract the hashtag string, you’ll have to use some other string manipulation in you application programming language.