I have a HTML file that I want to read using Jsoup and export the results to an excel sheet. In that process, I want to extract the links(src) of all the images present in the HTML file.
Here’s the code snippet that I have used to do the same:
File myhtml = new File("D:\\Projects\\Java\\report.html");
//get the string from the file myhtml
String str = getFileString(myhtml);
//getting the links to the images as in the html file
Document doc = Jsoup.parseBodyFragment(str);
Elements media = doc.select("[src]");
//System.out.println(media.size());
for(Element imageLink:media)
{
if(imageLink.tagName().equals("img"))
//storing the local link to image as global variable in imlink
P1.imlink = imageLink.attr("src").toString();
System.out.println(P1.imlink);
}
}
I have two images in the HTML file that I want the links for. However, the code that I have written shows the link to only the first image present in the file. Please help me finding out the error in my code!
Try this here:
Btw. maybe your problem is the part where you store the link into a global variable. This is overwritten everytime you run through the loop. A better solution is storing the link into a List or leave the loop after first hit.