how can I write Regular Rxpression to search for a string that contains “http://” AND does not contain “mysite.com”?
Share
Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.
Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.
Lost your password? Please enter your email address. You will receive a link and will create a new password via email.
Please briefly explain why you feel this question should be reported.
Please briefly explain why you feel this answer should be reported.
Please briefly explain why you feel this user should be reported.
WARNING
Attempting to rope regexes into boolean logic best accomplished in a proper programming language is a thankless job. While it is possible to write
/PAT1/ and not /PAT2/using complex lookaheads so that it is just one pattern, it is a painful task. You don’t to do it this way!You should have explained what you were really doing in the first place — some sort of match operation in a text editor. You didn’t. So you get a general answer that is going to be challenging to adapt to your localized situation.
Quick Answer
Using Perl syntax, you could stick that (pre-)compiled pattern into a variable for future use this way:
However, if your goal is to pull out all such links, that isn’t going to help you, because those lookaheads do not tell you where in the string your link occurs.
In that case, it’s a lot easier to write something to pull out the links and then filter out your unwanted cases after that, using two separate regexes instead of trying to make do everything.
Note that no attempt is made to match only links that contain valid URL characters, or that there is no accidental trailing punctuation as so often occurs in plain text.
And now, for a real answer
Note also that if you’re parsing HTML with this, the approach outlined above is just a quick-and-dirty, fast-and-loose, shoot-from-the-hip kind of link extraction. It’s easy to construct valid input that turns up a lot of false positives, and not altogether hard to construct input that produces false negatives, too.
Here, in contrast, is a full program that dumps out all the
<a ...>and<img ...>link address in its URL arguments, and actually does so correctly because it uses a real parser.If you run it on a URL like this, it gives this accounting of all the anchor and image links:
For any work for serious than running a quick
grepover a file to eyeball general results, you need to use a proper parser to do this sort of thing.