I am using the XML regex pattern to match my proxy URL.
eg: Proxy : ab-proxy-sample.company.com:8080
My requirement :
- Should not begin with http:// OR https:// (Match the whole word)
- Should accept any string + a port
- Should accept even strings starting with ht
My current XML regex is : [^http://|https://].+:[0-9]+|
But its matching each letter instead of the whole word ?
Any help would be highly appreciated.
Thanks in advance !
As @arnep points out, you’re attempting to use a negated character class with alternation, which isn’t the way it works. Also, here is some information regarding lookaheads.
I’m sure someone else will post an answer you can copy and paste, but this is a useful opportunity to learn the basics of regex!
UPDATE:
I didn’t realize that you were using an engine that doesn’t support negative lookarounds. Without negative lookarounds, it’s nearly impossible to achieve what you’re trying to do.
Nearly 😉
Here is a “brute-force” combinatoric method of doing it:
If the XML engine doesn’t support non-captured groups, i.e.
(?: ... )then use regular groups instead:If the XML engine doesn’t support characters classes like
\Sand\dthen use[^ \t\r\n\p]and[0-9]instead.Here is a running example: http://rubular.com/r/JnpCVgeLmL. Try changing the test string. You’ll see that…
Note that you do not need the
^and$. I added these specifically for the Rubular demo, but apparently the XML engine assumes this condition (anchored-ness).How does this work? It’s easier to understand if we break it up like this:
The explanation:
Here, it gets tricky: now we encounter three branches.
And finally, if we’ve gotten this far, then we look for a string of non-whitespace characters, followed by a colon, followed by a string of digits.
I leave it to a smarter mathematician than myself to ponder whether all strings matchable using lookarounds can be “brute-forced” in such a way.