I’m creating a system to sign up for different events. For each event it stores an address which can be one of the following:
- Facebook resource (basically URL starting with “facebook.com”)
- E-mail address (any valid e-mail)
- Another URL
- (bogus/thrash/etc)
The 4th is not important.
I need to do different things depending on the type of adress (FB API/send e-mail/POST a form). I was thinking about just storing what type it is but I first want to ask if there is some regexp or similar to know what type it is.
The first one is easy, just check if it starts with “http://www.facebook.com”. For the others I thought about looking for tokens like “http://” or “@” but then I thought both can contain both of those.
First, @zespri is correct in his comment – it’s a much better design to store the actual type. Even if you use the regular expressions I suggest below, things could still break in the future.
But yes, it’s possible to use regex in this case:
The following regex is the quintessential email detector. It’s much safer to use than just an ‘@’ sign:
([a-zA-Z]+[a-zA-Z0-9._+\-]{3,}(?:@|%40)[a-zA-Z0-9]+[a-zA-Z0-9\.\-]?(?:\.[a-zA-Z]+)+)The following three find facebook profiles and pages.
You can get rid of the suffix to stay with just the facebook domain(s), or do some further research and edits to limit to other kinds of facebook resources:
facebook\.(?:com?\.|net\.)?[a-z]{2,3}/.+\?id=(\d+) facebook\.(?:com?\.|net\.)?[a-z]{2,3}/p\.php.+i=(\d+) facebook\.(?:com?\.|net\.)?[a-z]{2,3}/(\w[\w\.\-]+\w)(?:$|[/\?#])Avoid the ‘http://www.’ prefix – you never know what subdomain may be used, plus they’re often omitted.
Also note that there are more tld’s to facebook than just the .com
For ‘other’ URLs, you could just look for the anchor
It’s unclear from your question whether users enter these into your system, or whether it’s done in an uncontrolled manner. Note that people often omit the http prefix, so this isn’t really a reliable way to detect URLs.
If you’re looking for URLs as links within HTML pages they can be more reliably detected by searching for anchors: