Being a designer with limited coding experience, I always considered regex to be some kind of black magic. Recently, I’ve been reading up a bit – and I’m getting pretty intrigued by its possibilities. So I decided to give it a first try in my current php project.
I want to find all URLs of the following structure:
http://[any subdomain, only a-z].domain.com/[any subfolder, can contain a-z,A-Z,0-9,- and _]/
Examples:
My regex:
http://[a-z]*\.domain\.com/[A-Za-z0-9\_\-]*/
My questions:
- The regex is working, but I’m just wondering whether it could be improved. For instance, I tried adding case insencitive with
(i?)but couldn’t get it working. - I could only get it working in php if I added double quotes at start and end of the expression, why is that?
$ref = preg_replace('"http://[a-z]*\.domain\.com/[A-Za-z0-9\_\-]*/"','',$ref);
In php regex must be delimited, usualy by
/but it can be almost any character.The reason why your second attempt works is because you’re using
"as delimiter.To be case insensitive you have to put the flag
iafter the second delimiter:With the
iflag there’re no needs for[a-zA-Z]and[a-z]would suffice. Moreover you don’t need to escape the underscore_in the character class and not the dash-if it’s placed at the first or the last position within the character classNote that
[a-zA-Z0-9_]can be abbreviated as\w, then your code could look like:Take into account that
*stands for 0 or more times, so your regex will match something like:http://.domain.com//Change
*by+that means 1 or more time to be sure you have at least one char for the subdomain and one char for subfolder:And then
"is unusual for delimiter, use for example#,~or!: