This is a bit hard to sum up in a title, but here is my problem:
(?:(?:http|https):\\/\\/)?(?:\\/\\/www\\.)?youtube.com\\/watch\\?(?:.*)v=(\\w{11}).*
Given the expression given below, I really really don’t understand why ftp://www.youtube.com/watch?v=F5eScJmYZZ8 matches. I unsuccessfully tried to add ^ to the expression beginning, but then, my expression does not match anything anymore (this is done in Java, that explains the doubled backslashes).
How can ftp be accepted as it is clearly not listed in (http|ftp)?
EDIT
To be accurate, here is what is allowed:
- http(s)://www.[…]
- http(s)://[…]
- http://www.[…%5D
- […]
and nothing else.
Because the leading
(?:(?:http|https):\\/\\/)?is optional. That’s what the question mark at the end of the group signifies (match at most one, i.e. match only if it exists).A leading
^should prevent the match withftpthough. Can you post the failing regex you tried (with the^)?UPDATE:
Aha! It matches without the
^since thehttpgroup is optional, and anything can come before the match (e.g.cheeseyoutube.com/...would match). Adding a^to the beginning of the regex fixes this, but there’s another problem with your regex: thewwwgroup is trying to match two slashes (as first pointed out in Justin’s answer), which it can’t once thehttpgroup has already matched those slashes. So thewwwgroup fails to match (fine, since it’s optional), but then theyoutubepart can’t match since there’s an unmatchedwwwin the way!This should fix your problem: