I have a URL pattern that needs to contain either APPLES or ORANGES in it, no other value. Optionally, it can also have query parameters. I’ve tried a number of RegEx patterns, but I just can’t get a pattern that will respect the strict match.
Sample URLs
Good
http://www.website.com/en/pages/APPLES
http://www.website.com/en/pages/APPLES?k=v
http://www.website.com/en/pages/ORANGES?k=v&k2=v2
http://www.website.com/en/pages/ORANGES
Bad
http://www.website.com/en/pages/APPLES???k=v
http://www.website.com/en/pages/APPLES?k=v=v
http://www.website.com/en/pages/APPLESORANGES
http://www.website.com/en/pages/1APPLES
http://www.website.com/en/APPLES
Attempted RegEx Patterns (well, at least the best attempts)
(http://*.*.website*.*.com/*.*/pages(/APPLES)|(/ORANGES)[\?]*.*)
(http://*.*.website*.*.com/*.*/pages(/APPLES|/ORANGES)[\?]*.*)
If you’re curious, I intentionally want to allow any sub-domain, suffix after “website” (for different environments), and any path between .com/ and /pages, hence the use of . in a number of places.
What would be the best way to achieve this?
**Edit: Final Answer**
My final answer was merged from mathematical.coffee and fardjad.
^https?://.*\.website\b.*\.com/.*/pages/(APPLES\b|ORANGES\b)((\?\w+=\w+)(&?\w+=\w+)*)?$
The single limitation I’ve discovered is that it will not allow a few valid characters (.~_-%+) in the query string parameter key=value pairs (see: http://en.wikipedia.org/wiki/Query_string#Structure). This isn’t an issue for me as I’m matching against a string returned from .NET’s Uri class, so I know the URL is well-formed overall.
I think the
*.*should be.*:Explanation:
NOTE – there are usually problems parsing query strings with regex and making sure it’s a syntactically valid regex.
For example, in the regex I supplied above, I’ve said that the value in &key=value can’t have an ampersand in it. But it could be an escaped entity, like
&, which is legal.You’ll always suffer from this sort of problem when you try to parse syntax with regex. It’s a risk you’ll have to take.
Alternatively, I am sure there is a C# module to parse URLs (many other languages have these), and they take care of all these special cases for you.