Note: I’m using WordPress, but I don’t believe it’s relevant to the answer, so I’ve asked it on SO. If I’m wrong, please tell me/move the question.
Okay, I’m loading up blocks of rich content (via WordPress) which frequently contain many images wrapped in anchor tags. I’d like to step through all of them in order to display them as a tags with their relevant imgs inside.
I’ve already found this handy bit of regex-powered code which gets me the images perfectly well:
// Get the all post content in a variable
$posttext = $post->post_content;
//$posttext1 = get_cleaned_excerpt();
// We will search for the src="" in the post content
$regular_expression = '~src="[^"]*"~';
$regular_expression1 = '~<img [^\>]*\ />~';
// WE will grab all the images from the post in an array $allpics using preg_match_all
preg_match_all( $regular_expression, $posttext, $allpics );
// Count the number of images found.
$NumberOfPics = count($allpics[0]);
// This time we replace/remove the images from the content
$only_post_text = preg_replace( $regular_expression1, '' , $posttext1);
/*Only text will be printed*/
// Check to see if we have at least 1 image
if ( $NumberOfPics > 0 )
{
$this_post_id = get_the_ID();
for ( $i=0; $i < $NumberOfPics ; $i++ )
{ $str1=$allpics[0][$i];
$str1=trim($str1);
$len=strlen($str1);
$imgpath=substr_replace(substr($str1,5,$len),"",-1);
$theImageSrc = $imgpath;
global $blog_id;
if (isset($blog_id) && $blog_id > 0) {
$imageParts = explode('/files/', $theImageSrc);
if (isset($imageParts[1])) {
$theImageSrc = '/blogs.dir/' . $blog_id . '/files/' . $imageParts[1];
}
}
?>
<img class="alignleft" src='<?php echo get_bloginfo('template_directory').'/timthumb.php?src=' . $theImageSrc . '&h=150&w=150'; ?>' height="150" width="150" alt=""/>
I’d really like to wrap that bottom img with the relevant parent a. Any help here would be greatly appreciated.
An example of the content to be searched might be:
<h5>
<a href="http://www.example.com/imagefoo.jpg">
<img class="size-thumbnail wp-image-4091 alignleft" src="http://www.example.com/imagefoo-150x150.jpg" alt="" width="150" height="150" />
</a>
</h5>
<h5>
<a href="http://www.example.com/Image-Bar.jpg">
<img class="wp-image-4087 alignleft" title="Image - Bar" src="http://www.example.com/Image-Bar-150x150.jpg" alt="" width="150" height="150" />
</a>
</h5>
<h5>
<a href="http://www.example.com/Image-Alphe.jpg">
<img class="wp-image-4090 alignleft" title="Image-Alpha" src="http://www.example.com/Image-Alpha-150x150.jpg" alt="" width="150" height="150" />
</a>
</h5>
<a href="http://www.example.com/EXAMPLE-image-150.jpg"><img class="size-thumbnail wp-image-4088 alignleft" title="EXAMPLE-image-150" src="http://www.example.com/EXAMPLE-image-150-150x150.jpg" alt="" width="150" height="150" /></a>
<h5>Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.</h5>
<a href="http://www.example.com/insanely-long-permalink-created-as-if-by-a-madman-who-knows-no-bounds-of-shame/" rel="attachment wp-att-2780">
<img class="alignright size-thumbnail wp-image-2780" title="Exhibition Title: Image Name by Artist Person" src="http://www.example.com/wp-content/uploads/2011/12/ExtraordinaryImage-150x150.jpg" alt="Example UK | Exhibition: Image by Artist Person" width="150" height="150" />
</a>
Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat.
EDIT: Here’s the working code based on my needs. It uses XPath, based on cHao’s answer below. (For what it’s worth, I found Tizag’s webpage very useful as an XPath primer, alongside this EarthInfo page.):
// Get the all post content in a variable
$posttext = $post->post_content;
$document = DOMDocument::loadHTML($posttext);
$xpath = new DOMXPath($document);
$i = 0;
# for each link that has an image inside it, set its href equal to
# the image's src.
foreach ($xpath->query('//a/img/..') as $link) :
$img = $link->getElementsByTagName('img')->item(0);
$link_src = $link->getAttribute('href');
$link_title = $link->getAttribute('title');
$img_src = $img->getAttribute('src');
$theImageSrc = $img_src;
global $blog_id;
if (isset($blog_id) && $blog_id > 0) {
$imageParts = explode('/files/', $theImageSrc);
if (isset($imageParts[1])) {
$theImageSrc = '/blogs.dir/' . $blog_id . '/files/' . $imageParts[1];
}
}
?>
<a href="<?php echo $link_src; ?>" rel="lightbox[<?php echo $this_post_id; ?>]" title="<?php if ($link_title) {
echo $link_title;
} else { the_title(); } ?>" class="cboxElement">
<img class="alignleft" src='<?php echo get_bloginfo('template_directory').'/timthumb.php?src=' . $theImageSrc . '&h=150&w=150'; ?>' height="150" width="150" alt=""/>
</a>
<?php
endforeach;
?>
You’d be better off not trying to use regular expressions for finding the images. They suck at parsing HTML.
Instead, check out the DOMDocument and DOMXPath classes.