I try to build RegExp to validate(preg_match) some path string for two following rules:
- path must consists only symbols from given range
[a-zA-z0-9-_\///\.] - path will not consist an up directory sequence “..”
this is a correct path example: /user/temp
and the bad one: /../user
UPD:
/user/temp.../foo will also be correct (thanks to Laurence Gonsalves)
Consider this:
I’ve actually built this pattern in three steps:
1) the first rule given said that only symbols allowed in the string are
0-9,a-zA-Z,_(underscore),-(hyphen),.(dot) and both slashes (/and\). First three positions can be expressed with a shortcut (\w), others require a character class:Note two things here: 1) hyphen should be either the first or the last symbol in the character class (otherwise it’s treated as a metacharacter used to define a range); 2) both dot and forward slash are not escaped yet (backslash is escaped, though; it’s too powerful to be left alone, even within
[...]subexpression).2) now we have to make sure that the pattern does indeed cover the whole string. We do it with so-called anchors –
^for beginning of the string,$for the end. And, not to forget that our string may consist of one or more allowed symbols (this expressed with+quantifier). So the pattern becomes this:3) one last thing – we have to prevent using
../and..\(preceded by/or\– or not, if..[/\\]sequence begins the string) as well.The easiest way of expressing this rule is using so-called ‘negative lookahead‘ test. It’s written within (?!…) subexpression, and (in this case) describes the following idea: ‘make sure that sequence of zero or more symbols is not followed by “slash-two dots-slash” sequence’:
One last thing is actually placing the pattern into
preg_matchfunction: as we use/symbol within the regex, we can just choose another set of delimiters. In my example, I chose ‘#’:See? It’s real easy. ) You just have to start from small things and gradually develop them.