I am trying to search all image tags on a specific page. An example page would be http://www.chapitre.com
I am using the following code to search for all images on the page:
HtmlPage page = HTMLParser.parseHtml(webResponse, webClient.openWindow(null,"testwindow"));
List<?> imageList = page.getByXPath("//img");
ListIterator li = imageList.listIterator();
while (li.hasNext() ) {
HtmlImage image = (HtmlImage)li.next();
URL url = new URL(image.getSrcAttribute());
//For now, only load 1X1 pixels
if (image.getHeightAttribute().equals("1") && image.getWidthAttribute().equals("1")) {
System.out.println("This is an image: " + url + " from page " + webRequest.getUrl() );
}
}
This doesn’t return me all the image tags in the page. For example, an image tag with attributes “src=”http://ace-lb.advertising.com/site=703223/mnum=1516/bins=1/rich=0/logs=0/betr=A2099=%5B+%5DLP2″ width=”1″ height=”1″” should be captured, but its not. Am I doing something wrong here?
Any help is really appreciated.
Cheers!
That’s because
Is throwing you an exception 🙂
Try this code:
You can even get those 1×1 pixel images by xpath.
Hope this helps.