I made a wordpress plugin which sends the blogposts as post data so that I can save the webpage. I get the data from the blog using the following query:
select * from $wpdb->posts
The above line is not important, but just mentioned it to tell you how I am getting the blog data.
The data contains HTML markup. I need to parse the HTML to get the URLs of images . Once I get the URL, I know how to download the images from the URL. I want to know a good way of parsing HTML markup to get the URLs of images without any error.
python is the preferred language.
There are several python modules that will do this for you:
For example,
results in
urls == ['yourimage1.jpg', 'yourimage2.jpg']