would like to match urls which are outside the comment function from javascript.
Regex for Url’s:
((mailto\:|(news|(ht|f)tp(s?))\://){1}\S+)
given this example:
/* http://goog.le */
http://goog.le
it should only match the second.
i tried here so far with this regex without success:
(/*)[^(*/)]*((mailto\:|(news|(ht|f)tp(s?))\://){1}\S+)
thanks for advise
In general it’s hard (but certainly not impossible) to do this sort of parsing with regex – you have to make assumptions such as the input is well-formed.
First note that in your regex the
{1}is redundant so that can be removed.You can do something like the following, which matches the URL only if it’s not followed by a
*/(with no matching/*). The logic is that if it is followed by*/, it’s probably in a comment:Of course, this will fail if you have
*/in the source without having a matching/*, e.g.I think any regex approach you take will in some way rely on the input being well-formed — comments are balanced, etc.
(If you are using javascript, is it possible to use some sort of XML parsing instead? This works much better and will probably allow you to ignore comments in any case).