how do I find URLs (i.e. http://www.domain.com) within a document, and put those within anchors: < a href=”www.domain.com” >http://www.domain.com< /a >
html:
Hey dude, check out this link www.google.com and www.yahoo.com!
javascript:
(function(){var text = document.body.innerHTML;/*do replace regex => text*/})();
output:
Hey dude, check out this link <a href="www.google.com">www.google.com</a> and <a href="www.yahoo.com">www.yahoo.com</a>!
Firstly,
www.domain.comisn’t a URL, it’s a hostname, andwon’t work — it’ll look for a
.comfile calledwww.domainrelative to the current page.It’s not possible to highlight hostnames in the general case because almost anything can be a hostname. You could try to highlight ‘www.something.dot.separated.words’, but it’s not really that reliable and there are many sites that don’t use the
www.hostname prefix. I’d try to avoid that.This is an very liberal pattern you could use as a starting point for detecting HTTP URLs. Depending on what sort of input you’ve got you may want to narrow down what it allows, and it may be worth detecting trailing characters like
.or!that would be valid parts of the URL but in practice generally aren’t.(You could use a
|to allow either the URL syntax or thewww.hostnamesyntax, if you like.)Anyhow, once you’ve settled on your preferred pattern you’ll need to find that pattern in text nodes on the page. Don’t run the regexp over
innerHTMLmarkup. You’ll end up completely ruining the page by trying to mark up everyhref="http://something"that’s already inside markup. You’ll also destroy any existing JavaScript references, events or form field values when you replace theinnerHTMLcontent.In general regexp simply cannot process HTML in any reliable way. So take advantage of the fact that the browser has already parsed the HTML into elements and text nodes, and just look at the text nodes. You’ll also want to avoid looking inside
<a>elements, since marking up a URL as a link when it’s already in a link is silly (and invalid).