Our web application allows users to specify their own “slugs” which can include relative paths e.g. /somedir/some-file.htm.
In our routing configuration we need to ensure that only valid slugs (with segments) are supported.
The regex I am using is:
(^[a-z0-9])([a-z0-9-/]+)([a-z0-9])$
This means:
- A valid slug will match e.g. some-file.htm
- A valid slug with segments (relative path) will match e.g. somedir/subdir/some-file.htm
- Absolute URLs will NOT match e.g. /somedir/some-file.htm
- Trailing / or – will not match e.g. -slug-
Unfortunately it also means that double slashes will match e.g. somedir//subdir//some-file.htm because my expression is allowing one or more slashes.
How can I change it to allow zero or more slashes between segments.
I thought that:
(^[a-z0-9])(/?[a-z0-9-]+/?)([a-z0-9])$
would work but it does not.
^[a-z0-9]([a-z0-9-]*[a-z0-9])?(/[a-z0-9]([a-z0-9-]*[a-z0-9])?)*$EDIT: Use this one if you like the first regex:
^(?!-)[a-z0-9-]+(?<!-)(/(?!-)[a-z0-9-]+(?<!-))*$It looks messy and complicated, but it seems to be correct per your spec.
[a-z0-9]([a-z0-9-]*[a-z0-9])?Matches a single name. Ignoring
/s for the moment.Then the rest of it is a single slash followed by that same thing again.
As mentioned in Karoly’s answer, this does not include literal periods, for instance “some-file.htm” will not match the regex I wrote.
If this is desired behavior then you’ll actually want:
^[a-z0-9]([a-z0-9-\.]*[a-z0-9])?(/[a-z0-9]([a-z0-9-\.]*[a-z0-9])?)*$Finally, if you want to allow literal periods in only the last section then you’ll want:
^[a-z0-9]([a-z0-9-]*[a-z0-9])?(/[a-z0-9]([a-z0-9-]*[a-z0-9])?)*(/[a-z0-9]([a-z0-9-\.]*[a-z0-9])?)?$EDIT:
A thought occurs that this can be simplified a bit using lookaheads and behinds.
^[a-z0-9]([a-z0-9-]*[a-z0-9])?(/[a-z0-9]([a-z0-9-]*[a-z0-9])?)*(/[a-z0-9]([a-z0-9-\.]*[a-z0-9])?)?$becomes:
^(?!-)[a-z0-9-]+(?<!-)(/(?!-)[a-z0-9-]+(?<!-))*(/(?!-\.)[a-z0-9-\.]+(?<!-\.))?$