I would need one or more regular expressions to match some invalid urls of a website, that have uppercase letters before OR after a certain pattern.
These are the structure rules to match the invalid URLs:
- a defined website
- zero, or more uppercase letters if zero uppercase letters after the pattern
- a pattern
- zero, or more uppercase letters if zero uppercase letters before the pattern
To be explicit with examples:
http://website/uppeRcase/pattern/upperCase // match it, uppercase before and after pattern
http://otherweb/WhatevercAse/pattern/whatevercase // do not match, no website
http://website/lowercase/pattern/lowercase // do not match, no uppercase before or after pattern
http://website/lowercase/pattern/uppercasE // match it, uppercase after pattern
http://website/Uppercase/pattern/lowercase // match it, uppercase before pattern
http://website/WhatevercAse/asdasd/whatEveRcase // do not match it, no pattern
Thanks in advance for your help!
Mario
To match uppercase letters you simply need
[A-Z]. Then build around that the rest of your rules. Without knowing the exactly what you mean by “website” and “pattern” it is difficult to give better guidance.This expression will match if uppercase characters are both between “website” and “pattern” as well as after “pattern”
^http://website/.*[A-Z]+.*/pattern/.*[A-Z]+.*$This expression will bath on either uppercase-case
^http://website/(.*[A-Z]+.*/pattern/.*[A-Z]+.*|.*[A-Z]+.*/pattern/.*|.*/pattern/.*[A-Z]+.*)$UPDATE:
To @TokenMacGuy’s point, RegEx parsing of URLs can be very tricky. If you want to break into parts and then validate, you can start with this expression which should match and group most* URLs.
(?<protocol>(http|ftp|https|ftps):\/\/)?(?<site>[\w\-_\.]+\.(?<tld>([0-9]{1,3})|([a-zA-Z]{2,3})|(aero|arpa|asia|coop|info|jobs|mobi|museum|name|travel))+(?<port>:[0-9]+)?\/?)((?<resource>[\w\-\.,@^%:/~\+#]*[\w\-\@^%/~\+#])(?<queryString>(\?[a-zA-Z0-9\[\]\-\._+%\$#\~',/]*=[a-zA-Z0-9\[\]\-\._+%\$#\~',/]*)+(&[a-zA-Z0-9\[\]\-\._+%\$#\~',/]*=[a-zA-Z0-9\[\]\-\._+%\$#\~',/]*)*)?)?*it worked in all my tests, but I can’t claim I was exhaustive.