Here is what I quickly came up with. It works with regexKitLite on the iPhone:
#define kUserRegex @"((?:@){1}[0-9a-zA-Z_]{1,15})";
Twitter only allows letters/numbers, underscores _, and a max of 15 chars (without @). My regex seems fine but reports false positives on e-mail addresses.
#define kHashtagRegex @"((?:#){1}[0-9a-zA-Z_àáâãäåçèéêëìíîïðòóôõöùúûüýÿ]{1,140})";
kHashtagRegex works with accentuated words but it is not enough for UTF-8 words.
What is the ‘tech spec’ of a hashtag?
Is there a reference somewhere on what to use for parsing these? Or do you have advice on how to enhance this regex?
I’m not sure if this is complete, bu this is what I would do:
For the username, Add a check for whitespace/start of string before the
@to eliminate emails(?:^|\s):for the hash tags, I would just say \w or \d