I am trying to create a PHP function that downloads images from a webpage that you put in as a parameter. However, the webpage itself though is a kind of gallery which only has very small thumbnail versions of the images, each linking directly to the larger full jpeg images that I want to download to my local computer. So the images will not downloaded directly from the webpage itself that I put into the function, but rather from the individual links to these jpeg image files on the webpage.
So for example:
www.somesite.com/galleryfullofimages/
is the location of the image gallery,
and each jpeg image file from the gallery that I want is then located at something like:
www.somesite.com/galleryfullofimages/images/01.jpg
www.somesite.com/galleryfullofimages/images/02.jpg
www.somesite.com/galleryfullofimages/images/03.jpg
What I’ve been trying to do so far is to use the file_get_contents function to get the full html of the webpage as a string, and then try to isolate all of the <a href="images/01.jpg"> elements inside the quotes and put them inside of an array. Then use this array to locate each image and download them all with a loop.
this is what I have done so far:
<?php
$link = "http://www.somesite.com/galleryfullofimages/";
$contents = file_get_contents($link);
$results = preg_split('/<a href="[^"]*"/', $contents);
?>
But I am stuck at this point. I am also totally new to regular expressions, which as you can see I tried to use. How can I isolate each image link and then download the image? Or is there a better way of doing this altogether? I have also read about using cURL. But I can’t seem to implement that either.
I hope this all makes sense. Any help will be greatly appreciated.
This is commonly known as “scraping” a website. You already are retrieving the markup for the page, so you are off to a good start.
Here’s what you need to do next:
DOMDocument::loadHTML
XPath
XPath::query
allow_url_fopen