I am new to the world of coding as well as PHP and am having a hard time understanding how regular expressions are read.
For example, I constructed the simple regular expression below that is a weak attempt to validate an email address.
The email address – test@test.com
The regular expression – ^([0-9a-zA-Z])+@([-0-9a-zA-Z]+[.])+[a-zA-Z]{2,6}$
What I would like to understand is how each segment of the email address in this example is dealt with and read e.g.
- test would be considered as segment 1,
- @ would be considered the second segment,
- the periond(.) would be considered the third segment,
- etc
Obviously if I introduce an additional segment to the equation e.g. test-123 the regular expression fails.
the basics are
^ matches start of string
() matches a segment which you can extract and use, if you were using preg_match for example
([0-9a-zA-Z])+ means it will match 1 or more of those characters specifed only
for an email address there are other chars that are valid, you should read the email RFC if you want to get into that detail
https://www.rfc-editor.org/rfc/rfc5322
there are alternative ways of doing this, eg if you used the /i modifier at the end of your pattern you can make it case insensitive, and then you don’t need to specify both a-z and A-Z
the @ symbol is pretty obvious, a necessary part of the email address (in external systems, internal email doesn’t always need an @ as it can default to the internal domain)
([-0-9a-zA-Z]+[.])+
this part matches the main part of the domain, I notice you have included hyphen – in the charset this time. also I am thinking you should have \. rather then just . by itself, as . matches anything, not fullstop as you would expect.
so it would match 1 or more of these characters [-0-9a-zA-Z], followed by any character at all
[a-zA-Z]{2,6}
matches [a-zA-Z] with minimum length of 2, and max length 6
$ matches then end of the string
if you had spaces after end of the email address it would fail validation, so you would need to trim it first in that case
matching an email address is actually not an easy thing to start with as there are quite a number of variations that are all valid
for example these could all be valid email addresses
bumperbox
bumperbox@invalid.com
bumper-box@invalid.com
bumperbox@invalid.co.uk
bumper.box@subdomain.invalid.school.nz
Your best bet is to use one of the already established email validation patterns available around the web, there are a few discussions about email validation in the php manual under preg_match, etc
you can also use functions such as filter_var if you have a recent (5.2+) version of php
http://nz.php.net/manual/en/function.filter-var.php