I have the following code below on my website. It’s used to find the images in a block of html that don’t have http:// or / in front. If this is the case, it will add the website url to the front of the image source.
For example:
<img src="http://domain.com/image.jpg"> will stay the same
<img src="/image.jpg"> will stay the same
<img src="image.jpg"> will be changed to <img src="http://domain.com/image.jpg">
I feel my code is really inefficient… Any ideas on how I could make it run with less code?
preg_match_all('/<img[\s]+[^>]*src\s*=\s*[\"\']?([^\'\" >]+)[\'\" >]/i', $content_text, $matches);
if (isset($matches[1])) {
foreach($matches[1] AS $link) {
if (!preg_match("/^(https?|ftp)\:\/\//sie", $link) && !preg_match("/^\//sie", $link)) {
$full_link = get_option('siteurl') . '/' . $link;
$content_text = str_replace($link, $full_link, $content_text);
}
}
}
For a start you could stop using regular expressions to process HTML, particularly when what you’re doing is so easily done with an HTML parser (of which PHP has at least 3). For example:
Problem solved. Well, almost. The case where you add the hostname to relative URLs but not to those beginning with / is a little puzzling and not handled in this snippet but it’s a relatively minor change (it involves checking
$url['path']).See Parse HTML With PHP And DOM, the Document Object Model,
parse_url()andhttp_build_url(). PHP has much better tools for this than regular expressions.Oh and for good measure read Parsing Html The Cthulhu Way.